Overview

Dataset statistics

Number of variables41
Number of observations59400
Missing cells46094
Missing cells (%)1.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory18.6 MiB
Average record size in memory328.0 B

Variable types

Numeric10
Categorical29
Boolean2

Warnings

recorded_by has constant value "GeoData Consultants Ltd" Constant
date_recorded has a high cardinality: 356 distinct values High cardinality
funder has a high cardinality: 1897 distinct values High cardinality
installer has a high cardinality: 2145 distinct values High cardinality
wpt_name has a high cardinality: 37400 distinct values High cardinality
subvillage has a high cardinality: 19287 distinct values High cardinality
lga has a high cardinality: 125 distinct values High cardinality
ward has a high cardinality: 2092 distinct values High cardinality
scheme_name has a high cardinality: 2696 distinct values High cardinality
gps_height is highly correlated with construction_yearHigh correlation
region_code is highly correlated with district_codeHigh correlation
district_code is highly correlated with region_codeHigh correlation
construction_year is highly correlated with gps_heightHigh correlation
gps_height is highly correlated with population and 1 other fieldsHigh correlation
population is highly correlated with gps_height and 1 other fieldsHigh correlation
construction_year is highly correlated with gps_height and 1 other fieldsHigh correlation
population is highly correlated with construction_yearHigh correlation
construction_year is highly correlated with populationHigh correlation
management_group is highly correlated with management and 3 other fieldsHigh correlation
district_code is highly correlated with region and 2 other fieldsHigh correlation
source_class is highly correlated with source_type and 3 other fieldsHigh correlation
payment_type is highly correlated with region and 2 other fieldsHigh correlation
water_quality is highly correlated with quality_groupHigh correlation
source_type is highly correlated with source_class and 7 other fieldsHigh correlation
gps_height is highly correlated with region and 3 other fieldsHigh correlation
waterpoint_type_group is highly correlated with source_class and 7 other fieldsHigh correlation
management is highly correlated with management_group and 2 other fieldsHigh correlation
region is highly correlated with district_code and 17 other fieldsHigh correlation
basin is highly correlated with gps_height and 7 other fieldsHigh correlation
longitude is highly correlated with region and 5 other fieldsHigh correlation
construction_year is highly correlated with gps_height and 3 other fieldsHigh correlation
quantity_group is highly correlated with management_group and 1 other fieldsHigh correlation
extraction_type_class is highly correlated with payment_type and 8 other fieldsHigh correlation
region_code is highly correlated with district_code and 4 other fieldsHigh correlation
extraction_type is highly correlated with source_class and 8 other fieldsHigh correlation
quantity is highly correlated with management_group and 1 other fieldsHigh correlation
source is highly correlated with source_class and 8 other fieldsHigh correlation
quality_group is highly correlated with water_qualityHigh correlation
extraction_type_group is highly correlated with source_type and 6 other fieldsHigh correlation
latitude is highly correlated with district_code and 6 other fieldsHigh correlation
waterpoint_type is highly correlated with source_type and 6 other fieldsHigh correlation
payment is highly correlated with payment_type and 2 other fieldsHigh correlation
scheme_management is highly correlated with management_group and 4 other fieldsHigh correlation
water_quality is highly correlated with quality_group and 1 other fieldsHigh correlation
source_type is highly correlated with source_class and 2 other fieldsHigh correlation
waterpoint_type_group is highly correlated with extraction_type_class and 4 other fieldsHigh correlation
basin is highly correlated with region and 1 other fieldsHigh correlation
extraction_type_class is highly correlated with waterpoint_type_group and 4 other fieldsHigh correlation
quality_group is highly correlated with water_quality and 1 other fieldsHigh correlation
waterpoint_type is highly correlated with waterpoint_type_group and 4 other fieldsHigh correlation
status_group is highly correlated with recorded_byHigh correlation
scheme_management is highly correlated with management_group and 2 other fieldsHigh correlation
management_group is highly correlated with scheme_management and 2 other fieldsHigh correlation
source_class is highly correlated with source_type and 2 other fieldsHigh correlation
payment_type is highly correlated with recorded_by and 1 other fieldsHigh correlation
management is highly correlated with scheme_management and 2 other fieldsHigh correlation
permit is highly correlated with recorded_byHigh correlation
region is highly correlated with basin and 1 other fieldsHigh correlation
quantity_group is highly correlated with recorded_by and 1 other fieldsHigh correlation
recorded_by is highly correlated with water_quality and 21 other fieldsHigh correlation
public_meeting is highly correlated with recorded_byHigh correlation
extraction_type is highly correlated with waterpoint_type_group and 4 other fieldsHigh correlation
quantity is highly correlated with quantity_group and 1 other fieldsHigh correlation
source is highly correlated with source_type and 2 other fieldsHigh correlation
payment is highly correlated with payment_type and 1 other fieldsHigh correlation
extraction_type_group is highly correlated with waterpoint_type_group and 4 other fieldsHigh correlation
funder has 3635 (6.1%) missing values Missing
installer has 3655 (6.2%) missing values Missing
public_meeting has 3334 (5.6%) missing values Missing
scheme_management has 3877 (6.5%) missing values Missing
scheme_name has 28166 (47.4%) missing values Missing
permit has 3056 (5.1%) missing values Missing
amount_tsh is highly skewed (γ1 = 57.80779995) Skewed
num_private is highly skewed (γ1 = 91.93374999) Skewed
id is uniformly distributed Uniform
id has unique values Unique
amount_tsh has 41639 (70.1%) zeros Zeros
gps_height has 20438 (34.4%) zeros Zeros
longitude has 1812 (3.1%) zeros Zeros
num_private has 58643 (98.7%) zeros Zeros
population has 21381 (36.0%) zeros Zeros
construction_year has 20709 (34.9%) zeros Zeros

Reproduction

Analysis started2021-09-15 20:22:55.670660
Analysis finished2021-09-15 20:23:34.942422
Duration39.27 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

id
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct59400
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37115.13177
Minimum0
Maximum74247
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size464.2 KiB
2021-09-16T01:53:35.078141image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3730.9
Q118519.75
median37061.5
Q355656.5
95-th percentile70564.05
Maximum74247
Range74247
Interquartile range (IQR)37136.75

Descriptive statistics

Standard deviation21453.12837
Coefficient of variation (CV)0.5780156866
Kurtosis-1.201515029
Mean37115.13177
Median Absolute Deviation (MAD)18568.5
Skewness0.00262253035
Sum2204638827
Variance460236716.9
MonotonicityNot monotonic
2021-09-16T01:53:35.187477image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
< 0.1%
198111
 
< 0.1%
382001
 
< 0.1%
341061
 
< 0.1%
361551
 
< 0.1%
463961
 
< 0.1%
484451
 
< 0.1%
423021
 
< 0.1%
709841
 
< 0.1%
730331
 
< 0.1%
Other values (59390)59390
> 99.9%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
ValueCountFrequency (%)
742471
< 0.1%
742461
< 0.1%
742431
< 0.1%
742421
< 0.1%
742401
< 0.1%
742391
< 0.1%
742381
< 0.1%
742371
< 0.1%
742361
< 0.1%
742351
< 0.1%

amount_tsh
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct98
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean317.6503847
Minimum0
Maximum350000
Zeros41639
Zeros (%)70.1%
Negative0
Negative (%)0.0%
Memory size464.2 KiB
2021-09-16T01:53:35.329322image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q320
95-th percentile1200
Maximum350000
Range350000
Interquartile range (IQR)20

Descriptive statistics

Standard deviation2997.574558
Coefficient of variation (CV)9.436709989
Kurtosis4903.543102
Mean317.6503847
Median Absolute Deviation (MAD)0
Skewness57.80779995
Sum18868432.85
Variance8985453.232
MonotonicityNot monotonic
2021-09-16T01:53:35.438698image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
041639
70.1%
5003102
 
5.2%
502472
 
4.2%
10001488
 
2.5%
201463
 
2.5%
2001220
 
2.1%
100816
 
1.4%
10806
 
1.4%
30743
 
1.3%
2000704
 
1.2%
Other values (88)4947
 
8.3%
ValueCountFrequency (%)
041639
70.1%
0.23
 
< 0.1%
0.251
 
< 0.1%
13
 
< 0.1%
213
 
< 0.1%
5376
 
0.6%
6190
 
0.3%
769
 
0.1%
91
 
< 0.1%
10806
 
1.4%
ValueCountFrequency (%)
3500001
 
< 0.1%
2500001
 
< 0.1%
2000001
 
< 0.1%
1700001
 
< 0.1%
1380001
 
< 0.1%
1200001
 
< 0.1%
1170007
< 0.1%
1000003
< 0.1%
700001
 
< 0.1%
600001
 
< 0.1%

date_recorded
Categorical

HIGH CARDINALITY

Distinct356
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
2011-03-15
 
572
2011-03-17
 
558
2013-02-03
 
546
2011-03-14
 
520
2011-03-16
 
513
Other values (351)
56691 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters594000
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique35 ?
Unique (%)0.1%

Sample

1st row2011-03-14
2nd row2013-03-06
3rd row2013-02-25
4th row2013-01-28
5th row2011-07-13

Common Values

ValueCountFrequency (%)
2011-03-15572
 
1.0%
2011-03-17558
 
0.9%
2013-02-03546
 
0.9%
2011-03-14520
 
0.9%
2011-03-16513
 
0.9%
2011-03-18497
 
0.8%
2011-03-19466
 
0.8%
2013-02-04464
 
0.8%
2013-01-29459
 
0.8%
2011-03-04458
 
0.8%
Other values (346)54347
91.5%

Length

2021-09-16T01:53:35.688698image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2011-03-15572
 
1.0%
2011-03-17558
 
0.9%
2013-02-03546
 
0.9%
2011-03-14520
 
0.9%
2011-03-16513
 
0.9%
2011-03-18497
 
0.8%
2011-03-19466
 
0.8%
2013-02-04464
 
0.8%
2013-01-29459
 
0.8%
2011-03-04458
 
0.8%
Other values (346)54347
91.5%

Most occurring characters

ValueCountFrequency (%)
0139059
23.4%
1129012
21.7%
-118800
20.0%
2103867
17.5%
352820
 
8.9%
712853
 
2.2%
410712
 
1.8%
89363
 
1.6%
66154
 
1.0%
56034
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number475200
80.0%
Dash Punctuation118800
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0139059
29.3%
1129012
27.1%
2103867
21.9%
352820
 
11.1%
712853
 
2.7%
410712
 
2.3%
89363
 
2.0%
66154
 
1.3%
56034
 
1.3%
95326
 
1.1%
Dash Punctuation
ValueCountFrequency (%)
-118800
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common594000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0139059
23.4%
1129012
21.7%
-118800
20.0%
2103867
17.5%
352820
 
8.9%
712853
 
2.2%
410712
 
1.8%
89363
 
1.6%
66154
 
1.0%
56034
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII594000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0139059
23.4%
1129012
21.7%
-118800
20.0%
2103867
17.5%
352820
 
8.9%
712853
 
2.2%
410712
 
1.8%
89363
 
1.6%
66154
 
1.0%
56034
 
1.0%

funder
Categorical

HIGH CARDINALITY
MISSING

Distinct1897
Distinct (%)3.4%
Missing3635
Missing (%)6.1%
Memory size464.2 KiB
Government Of Tanzania
9084 
Danida
 
3114
Hesawa
 
2202
Rwssp
 
1374
World Bank
 
1349
Other values (1892)
38642 

Length

Max length30
Median length6
Mean length9.929902268
Min length1

Characters and Unicode

Total characters553741
Distinct characters69
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique974 ?
Unique (%)1.7%

Sample

1st rowRoman
2nd rowGrumeti
3rd rowLottery Club
4th rowUnicef
5th rowAction In A

Common Values

ValueCountFrequency (%)
Government Of Tanzania9084
 
15.3%
Danida3114
 
5.2%
Hesawa2202
 
3.7%
Rwssp1374
 
2.3%
World Bank1349
 
2.3%
Kkkt1287
 
2.2%
World Vision1246
 
2.1%
Unicef1057
 
1.8%
Tasaf877
 
1.5%
District Council843
 
1.4%
Other values (1887)33332
56.1%
(Missing)3635
 
6.1%

Length

2021-09-16T01:53:35.923076image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
of9748
 
10.8%
government9276
 
10.3%
tanzania9172
 
10.1%
danida3123
 
3.5%
world2789
 
3.1%
water2645
 
2.9%
hesawa2203
 
2.4%
bank1416
 
1.6%
rwssp1376
 
1.5%
kkkt1370
 
1.5%
Other values (2065)47254
52.3%

Most occurring characters

ValueCountFrequency (%)
a68200
 
12.3%
n57842
 
10.4%
i38011
 
6.9%
e37464
 
6.8%
34673
 
6.3%
r27879
 
5.0%
t23016
 
4.2%
o22741
 
4.1%
s17208
 
3.1%
d15464
 
2.8%
Other values (59)211243
38.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter425880
76.9%
Uppercase Letter89705
 
16.2%
Space Separator34673
 
6.3%
Other Punctuation1322
 
0.2%
Decimal Number803
 
0.1%
Open Punctuation437
 
0.1%
Close Punctuation431
 
0.1%
Dash Punctuation323
 
0.1%
Connector Punctuation167
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T12110
13.5%
G10722
12.0%
O10613
11.8%
D7928
 
8.8%
W7352
 
8.2%
C4679
 
5.2%
R4454
 
5.0%
H3462
 
3.9%
M3135
 
3.5%
K2962
 
3.3%
Other values (16)22288
24.8%
Lowercase Letter
ValueCountFrequency (%)
a68200
16.0%
n57842
13.6%
i38011
 
8.9%
e37464
 
8.8%
r27879
 
6.5%
t23016
 
5.4%
o22741
 
5.3%
s17208
 
4.0%
d15464
 
3.6%
f15329
 
3.6%
Other values (16)102726
24.1%
Other Punctuation
ValueCountFrequency (%)
/783
59.2%
.469
35.5%
\33
 
2.5%
&26
 
2.0%
'11
 
0.8%
Decimal Number
ValueCountFrequency (%)
0793
98.8%
25
 
0.6%
12
 
0.2%
92
 
0.2%
41
 
0.1%
Open Punctuation
ValueCountFrequency (%)
(434
99.3%
[3
 
0.7%
Close Punctuation
ValueCountFrequency (%)
)429
99.5%
]2
 
0.5%
Space Separator
ValueCountFrequency (%)
34673
100.0%
Connector Punctuation
ValueCountFrequency (%)
_167
100.0%
Dash Punctuation
ValueCountFrequency (%)
-323
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin515585
93.1%
Common38156
 
6.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a68200
 
13.2%
n57842
 
11.2%
i38011
 
7.4%
e37464
 
7.3%
r27879
 
5.4%
t23016
 
4.5%
o22741
 
4.4%
s17208
 
3.3%
d15464
 
3.0%
f15329
 
3.0%
Other values (42)192431
37.3%
Common
ValueCountFrequency (%)
34673
90.9%
0793
 
2.1%
/783
 
2.1%
.469
 
1.2%
(434
 
1.1%
)429
 
1.1%
-323
 
0.8%
_167
 
0.4%
\33
 
0.1%
&26
 
0.1%
Other values (7)26
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII553741
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a68200
 
12.3%
n57842
 
10.4%
i38011
 
6.9%
e37464
 
6.8%
34673
 
6.3%
r27879
 
5.0%
t23016
 
4.2%
o22741
 
4.1%
s17208
 
3.1%
d15464
 
2.8%
Other values (59)211243
38.1%

gps_height
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct2428
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean668.2972391
Minimum-90
Maximum2770
Zeros20438
Zeros (%)34.4%
Negative1496
Negative (%)2.5%
Memory size464.2 KiB
2021-09-16T01:53:36.032490image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-90
5-th percentile0
Q10
median369
Q31319.25
95-th percentile1797
Maximum2770
Range2860
Interquartile range (IQR)1319.25

Descriptive statistics

Standard deviation693.1163503
Coefficient of variation (CV)1.037137833
Kurtosis-1.292440135
Mean668.2972391
Median Absolute Deviation (MAD)369
Skewness0.462402085
Sum39696856
Variance480410.2751
MonotonicityNot monotonic
2021-09-16T01:53:36.126236image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
020438
34.4%
-1560
 
0.1%
-1655
 
0.1%
-1355
 
0.1%
129052
 
0.1%
-2052
 
0.1%
-1451
 
0.1%
30351
 
0.1%
-1849
 
0.1%
-1947
 
0.1%
Other values (2418)38490
64.8%
ValueCountFrequency (%)
-901
 
< 0.1%
-632
 
< 0.1%
-591
 
< 0.1%
-571
 
< 0.1%
-551
 
< 0.1%
-541
 
< 0.1%
-531
 
< 0.1%
-522
 
< 0.1%
-512
 
< 0.1%
-505
< 0.1%
ValueCountFrequency (%)
27701
< 0.1%
26281
< 0.1%
26271
< 0.1%
26262
< 0.1%
26231
< 0.1%
26141
< 0.1%
25851
< 0.1%
25761
< 0.1%
25691
< 0.1%
25681
< 0.1%

installer
Categorical

HIGH CARDINALITY
MISSING

Distinct2145
Distinct (%)3.8%
Missing3655
Missing (%)6.2%
Memory size464.2 KiB
DWE
17402 
Government
 
1825
RWE
 
1206
Commu
 
1060
DANIDA
 
1050
Other values (2140)
33202 

Length

Max length30
Median length4
Mean length6.111202798
Min length1

Characters and Unicode

Total characters340669
Distinct characters70
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1098 ?
Unique (%)2.0%

Sample

1st rowRoman
2nd rowGRUMETI
3rd rowWorld vision
4th rowUNICEF
5th rowArtisan

Common Values

ValueCountFrequency (%)
DWE17402
29.3%
Government1825
 
3.1%
RWE1206
 
2.0%
Commu1060
 
1.8%
DANIDA1050
 
1.8%
KKKT898
 
1.5%
Hesawa840
 
1.4%
0777
 
1.3%
TCRS707
 
1.2%
Central government622
 
1.0%
Other values (2135)29358
49.4%
(Missing)3655
 
6.2%

Length

2021-09-16T01:53:36.407452image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
dwe17601
25.8%
government2778
 
4.1%
water1881
 
2.8%
hesawa1395
 
2.0%
rwe1230
 
1.8%
district1216
 
1.8%
kkkt1153
 
1.7%
council1106
 
1.6%
commu1065
 
1.6%
danida1051
 
1.5%
Other values (1976)37806
55.4%

Most occurring characters

ValueCountFrequency (%)
D27595
 
8.1%
W25849
 
7.6%
E25389
 
7.5%
a17343
 
5.1%
n16558
 
4.9%
e15500
 
4.5%
i15053
 
4.4%
A13668
 
4.0%
r13377
 
3.9%
t12904
 
3.8%
Other values (60)157433
46.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter167438
49.1%
Lowercase Letter158190
46.4%
Space Separator12673
 
3.7%
Other Punctuation971
 
0.3%
Decimal Number783
 
0.2%
Dash Punctuation268
 
0.1%
Connector Punctuation169
 
< 0.1%
Open Punctuation159
 
< 0.1%
Close Punctuation16
 
< 0.1%
Currency Symbol2
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
D27595
16.5%
W25849
15.4%
E25389
15.2%
A13668
8.2%
C10535
 
6.3%
S6659
 
4.0%
R6518
 
3.9%
I6160
 
3.7%
T5948
 
3.6%
K5390
 
3.2%
Other values (16)33727
20.1%
Lowercase Letter
ValueCountFrequency (%)
a17343
11.0%
n16558
10.5%
e15500
9.8%
i15053
9.5%
r13377
8.5%
t12904
 
8.2%
o12398
 
7.8%
m9289
 
5.9%
l6201
 
3.9%
s6173
 
3.9%
Other values (16)33394
21.1%
Other Punctuation
ValueCountFrequency (%)
/670
69.0%
.238
 
24.5%
&50
 
5.1%
'12
 
1.2%
#1
 
0.1%
Decimal Number
ValueCountFrequency (%)
0780
99.6%
11
 
0.1%
41
 
0.1%
91
 
0.1%
Close Punctuation
ValueCountFrequency (%)
}13
81.2%
]2
 
12.5%
)1
 
6.2%
Open Punctuation
ValueCountFrequency (%)
(157
98.7%
[2
 
1.3%
Space Separator
ValueCountFrequency (%)
12673
100.0%
Connector Punctuation
ValueCountFrequency (%)
_169
100.0%
Dash Punctuation
ValueCountFrequency (%)
-268
100.0%
Currency Symbol
ValueCountFrequency (%)
$2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin325628
95.6%
Common15041
 
4.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
D27595
 
8.5%
W25849
 
7.9%
E25389
 
7.8%
a17343
 
5.3%
n16558
 
5.1%
e15500
 
4.8%
i15053
 
4.6%
A13668
 
4.2%
r13377
 
4.1%
t12904
 
4.0%
Other values (42)142392
43.7%
Common
ValueCountFrequency (%)
12673
84.3%
0780
 
5.2%
/670
 
4.5%
-268
 
1.8%
.238
 
1.6%
_169
 
1.1%
(157
 
1.0%
&50
 
0.3%
}13
 
0.1%
'12
 
0.1%
Other values (8)11
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII340669
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
D27595
 
8.1%
W25849
 
7.6%
E25389
 
7.5%
a17343
 
5.1%
n16558
 
4.9%
e15500
 
4.5%
i15053
 
4.4%
A13668
 
4.0%
r13377
 
3.9%
t12904
 
3.8%
Other values (60)157433
46.2%

longitude
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct57516
Distinct (%)96.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean34.07742669
Minimum0
Maximum40.34519307
Zeros1812
Zeros (%)3.1%
Negative0
Negative (%)0.0%
Memory size464.2 KiB
2021-09-16T01:53:36.516823image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile30.04066001
Q133.09034738
median34.90874343
Q337.17838657
95-th percentile39.13323954
Maximum40.34519307
Range40.34519307
Interquartile range (IQR)4.08803919

Descriptive statistics

Standard deviation6.567431846
Coefficient of variation (CV)0.1927208854
Kurtosis19.18703105
Mean34.07742669
Median Absolute Deviation (MAD)2.032511095
Skewness-4.191046455
Sum2024199.146
Variance43.13116105
MonotonicityNot monotonic
2021-09-16T01:53:36.824200image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01812
 
3.1%
32.97719062
 
< 0.1%
32.919861392
 
< 0.1%
37.542784972
 
< 0.1%
39.105306612
 
< 0.1%
32.984789632
 
< 0.1%
39.103751982
 
< 0.1%
37.541579172
 
< 0.1%
37.281356972
 
< 0.1%
37.328905222
 
< 0.1%
Other values (57506)57570
96.9%
ValueCountFrequency (%)
01812
3.1%
29.60712191
 
< 0.1%
29.607201091
 
< 0.1%
29.610320561
 
< 0.1%
29.610964821
 
< 0.1%
29.611946741
 
< 0.1%
29.612506891
 
< 0.1%
29.612762961
 
< 0.1%
29.613443091
 
< 0.1%
29.61687181
 
< 0.1%
ValueCountFrequency (%)
40.345193071
< 0.1%
40.344300891
< 0.1%
40.325239961
< 0.1%
40.325226431
< 0.1%
40.323401811
< 0.1%
40.322832371
< 0.1%
40.322804531
< 0.1%
40.32262511
< 0.1%
40.322169021
< 0.1%
40.321965931
< 0.1%

latitude
Real number (ℝ)

HIGH CORRELATION

Distinct57517
Distinct (%)96.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-5.70603266
Minimum-11.64944018
Maximum-2 × 10-8
Zeros0
Zeros (%)0.0%
Negative59400
Negative (%)100.0%
Memory size464.2 KiB
2021-09-16T01:53:36.934971image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-11.64944018
5-th percentile-10.58554992
Q1-8.540621305
median-5.02159665
Q3-3.32615564
95-th percentile-1.408872227
Maximum-2 × 10-8
Range11.64944016
Interquartile range (IQR)5.214465665

Descriptive statistics

Standard deviation2.946019081
Coefficient of variation (CV)-0.5162990219
Kurtosis-1.057616666
Mean-5.70603266
Median Absolute Deviation (MAD)2.07002988
Skewness-0.1520365709
Sum-338938.34
Variance8.679028427
MonotonicityNot monotonic
2021-09-16T01:53:37.044343image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-2 × 10-81812
 
3.1%
-2.494545592
 
< 0.1%
-6.983182632
 
< 0.1%
-7.056922532
 
< 0.1%
-7.056372352
 
< 0.1%
-2.487084612
 
< 0.1%
-6.981884192
 
< 0.1%
-6.978262942
 
< 0.1%
-7.065372642
 
< 0.1%
-6.991294112
 
< 0.1%
Other values (57507)57570
96.9%
ValueCountFrequency (%)
-11.649440181
< 0.1%
-11.648377591
< 0.1%
-11.586296561
< 0.1%
-11.568576791
< 0.1%
-11.566804571
< 0.1%
-11.564508651
< 0.1%
-11.564323571
< 0.1%
-11.562315921
< 0.1%
-11.562288981
< 0.1%
-11.561618981
< 0.1%
ValueCountFrequency (%)
-2 × 10-81812
3.1%
-0.998464351
 
< 0.1%
-0.9989161
 
< 0.1%
-0.999012091
 
< 0.1%
-0.999117021
 
< 0.1%
-0.99946921
 
< 0.1%
-0.999506511
 
< 0.1%
-0.999522321
 
< 0.1%
-1.000585191
 
< 0.1%
-1.00152081
 
< 0.1%

wpt_name
Categorical

HIGH CARDINALITY

Distinct37400
Distinct (%)63.0%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
none
 
3563
Shuleni
 
1748
Zahanati
 
830
Msikitini
 
535
Kanisani
 
323
Other values (37395)
52401 

Length

Max length30
Median length10
Mean length10.96210438
Min length1

Characters and Unicode

Total characters651149
Distinct characters75
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique32928 ?
Unique (%)55.4%

Sample

1st rownone
2nd rowZahanati
3rd rowKwa Mahundi
4th rowZahanati Ya Nanyumbu
5th rowShuleni

Common Values

ValueCountFrequency (%)
none3563
 
6.0%
Shuleni1748
 
2.9%
Zahanati830
 
1.4%
Msikitini535
 
0.9%
Kanisani323
 
0.5%
Bombani271
 
0.5%
Sokoni260
 
0.4%
Ofisini254
 
0.4%
School208
 
0.4%
Shule Ya Msingi199
 
0.3%
Other values (37390)51209
86.2%

Length

2021-09-16T01:53:37.325590image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
kwa21384
 
19.6%
none3565
 
3.3%
mzee3385
 
3.1%
shuleni2123
 
1.9%
ya1499
 
1.4%
shule1389
 
1.3%
school1113
 
1.0%
primary1052
 
1.0%
zahanati983
 
0.9%
msingi870
 
0.8%
Other values (29461)71931
65.8%

Most occurring characters

ValueCountFrequency (%)
a98806
15.2%
i52404
 
8.0%
49898
 
7.7%
n42148
 
6.5%
e40985
 
6.3%
w31669
 
4.9%
K31385
 
4.8%
o30247
 
4.6%
u24217
 
3.7%
M22040
 
3.4%
Other values (65)227350
34.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter493422
75.8%
Uppercase Letter105185
 
16.2%
Space Separator49898
 
7.7%
Decimal Number1680
 
0.3%
Other Punctuation741
 
0.1%
Dash Punctuation104
 
< 0.1%
Open Punctuation37
 
< 0.1%
Close Punctuation37
 
< 0.1%
Connector Punctuation24
 
< 0.1%
Modifier Symbol21
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a98806
20.0%
i52404
10.6%
n42148
 
8.5%
e40985
 
8.3%
w31669
 
6.4%
o30247
 
6.1%
u24217
 
4.9%
l20954
 
4.2%
m17631
 
3.6%
h17215
 
3.5%
Other values (16)117146
23.7%
Uppercase Letter
ValueCountFrequency (%)
K31385
29.8%
M22040
21.0%
S10752
 
10.2%
N4880
 
4.6%
A3497
 
3.3%
B3425
 
3.3%
C2791
 
2.7%
P2564
 
2.4%
L2507
 
2.4%
J2385
 
2.3%
Other values (16)18959
18.0%
Decimal Number
ValueCountFrequency (%)
1507
30.2%
2439
26.1%
3152
 
9.0%
4120
 
7.1%
7106
 
6.3%
586
 
5.1%
680
 
4.8%
875
 
4.5%
970
 
4.2%
045
 
2.7%
Other Punctuation
ValueCountFrequency (%)
'417
56.3%
.175
23.6%
/146
 
19.7%
&2
 
0.3%
\1
 
0.1%
Open Punctuation
ValueCountFrequency (%)
(29
78.4%
[8
 
21.6%
Close Punctuation
ValueCountFrequency (%)
)29
78.4%
]8
 
21.6%
Space Separator
ValueCountFrequency (%)
49898
100.0%
Dash Punctuation
ValueCountFrequency (%)
-104
100.0%
Connector Punctuation
ValueCountFrequency (%)
_24
100.0%
Modifier Symbol
ValueCountFrequency (%)
`21
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin598607
91.9%
Common52542
 
8.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a98806
16.5%
i52404
 
8.8%
n42148
 
7.0%
e40985
 
6.8%
w31669
 
5.3%
K31385
 
5.2%
o30247
 
5.1%
u24217
 
4.0%
M22040
 
3.7%
l20954
 
3.5%
Other values (42)203752
34.0%
Common
ValueCountFrequency (%)
49898
95.0%
1507
 
1.0%
2439
 
0.8%
'417
 
0.8%
.175
 
0.3%
3152
 
0.3%
/146
 
0.3%
4120
 
0.2%
7106
 
0.2%
-104
 
0.2%
Other values (13)478
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII651149
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a98806
15.2%
i52404
 
8.0%
49898
 
7.7%
n42148
 
6.5%
e40985
 
6.3%
w31669
 
4.9%
K31385
 
4.8%
o30247
 
4.6%
u24217
 
3.7%
M22040
 
3.4%
Other values (65)227350
34.9%

num_private
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct65
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4741414141
Minimum0
Maximum1776
Zeros58643
Zeros (%)98.7%
Negative0
Negative (%)0.0%
Memory size464.2 KiB
2021-09-16T01:53:37.434930image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum1776
Range1776
Interquartile range (IQR)0

Descriptive statistics

Standard deviation12.23622981
Coefficient of variation (CV)25.80713147
Kurtosis11137.29521
Mean0.4741414141
Median Absolute Deviation (MAD)0
Skewness91.93374999
Sum28164
Variance149.72532
MonotonicityNot monotonic
2021-09-16T01:53:37.544306image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
058643
98.7%
681
 
0.1%
173
 
0.1%
846
 
0.1%
546
 
0.1%
3240
 
0.1%
4536
 
0.1%
1535
 
0.1%
3930
 
0.1%
9328
 
< 0.1%
Other values (55)342
 
0.6%
ValueCountFrequency (%)
058643
98.7%
173
 
0.1%
223
 
< 0.1%
327
 
< 0.1%
420
 
< 0.1%
546
 
0.1%
681
 
0.1%
726
 
< 0.1%
846
 
0.1%
94
 
< 0.1%
ValueCountFrequency (%)
17761
< 0.1%
14021
< 0.1%
7551
< 0.1%
6981
< 0.1%
6721
< 0.1%
6681
< 0.1%
4501
< 0.1%
3001
< 0.1%
2801
< 0.1%
2401
< 0.1%

basin
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
Lake Victoria
10248 
Pangani
8940 
Rufiji
7976 
Internal
7785 
Lake Tanganyika
6432 
Other values (4)
18019 

Length

Max length23
Median length10
Mean length10.8923569
Min length6

Characters and Unicode

Total characters647006
Distinct characters32
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLake Nyasa
2nd rowLake Victoria
3rd rowPangani
4th rowRuvuma / Southern Coast
5th rowLake Victoria

Common Values

ValueCountFrequency (%)
Lake Victoria10248
17.3%
Pangani8940
15.1%
Rufiji7976
13.4%
Internal7785
13.1%
Lake Tanganyika6432
10.8%
Wami / Ruvu5987
10.1%
Lake Nyasa5085
8.6%
Ruvuma / Southern Coast4493
7.6%
Lake Rukwa2454
 
4.1%

Length

2021-09-16T01:53:37.794346image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T01:53:37.928095image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
lake24219
22.2%
10480
9.6%
victoria10248
9.4%
pangani8940
 
8.2%
rufiji7976
 
7.3%
internal7785
 
7.1%
tanganyika6432
 
5.9%
wami5987
 
5.5%
ruvu5987
 
5.5%
nyasa5085
 
4.7%
Other values (4)15933
14.6%

Most occurring characters

ValueCountFrequency (%)
a107025
16.5%
i57807
 
8.9%
n50807
 
7.9%
49672
 
7.7%
e36497
 
5.6%
u35883
 
5.5%
k33105
 
5.1%
t27019
 
4.2%
L24219
 
3.7%
r22526
 
3.5%
Other values (22)202446
31.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter488262
75.5%
Uppercase Letter98592
 
15.2%
Space Separator49672
 
7.7%
Other Punctuation10480
 
1.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a107025
21.9%
i57807
11.8%
n50807
10.4%
e36497
 
7.5%
u35883
 
7.3%
k33105
 
6.8%
t27019
 
5.5%
r22526
 
4.6%
o19234
 
3.9%
g15372
 
3.1%
Other values (10)82987
17.0%
Uppercase Letter
ValueCountFrequency (%)
L24219
24.6%
R20910
21.2%
V10248
10.4%
P8940
 
9.1%
I7785
 
7.9%
T6432
 
6.5%
W5987
 
6.1%
N5085
 
5.2%
S4493
 
4.6%
C4493
 
4.6%
Space Separator
ValueCountFrequency (%)
49672
100.0%
Other Punctuation
ValueCountFrequency (%)
/10480
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin586854
90.7%
Common60152
 
9.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
a107025
18.2%
i57807
 
9.9%
n50807
 
8.7%
e36497
 
6.2%
u35883
 
6.1%
k33105
 
5.6%
t27019
 
4.6%
L24219
 
4.1%
r22526
 
3.8%
R20910
 
3.6%
Other values (20)171056
29.1%
Common
ValueCountFrequency (%)
49672
82.6%
/10480
 
17.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII647006
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a107025
16.5%
i57807
 
8.9%
n50807
 
7.9%
49672
 
7.7%
e36497
 
5.6%
u35883
 
5.5%
k33105
 
5.1%
t27019
 
4.2%
L24219
 
3.7%
r22526
 
3.5%
Other values (22)202446
31.3%

subvillage
Categorical

HIGH CARDINALITY

Distinct19287
Distinct (%)32.7%
Missing371
Missing (%)0.6%
Memory size464.2 KiB
Madukani
 
508
Shuleni
 
506
Majengo
 
502
Kati
 
373
Mtakuja
 
262
Other values (19282)
56878 

Length

Max length30
Median length7
Mean length7.897592709
Min length1

Characters and Unicode

Total characters466187
Distinct characters73
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9424 ?
Unique (%)16.0%

Sample

1st rowMnyusi B
2nd rowNyamara
3rd rowMajengo
4th rowMahakamani
5th rowKyanyamisa

Common Values

ValueCountFrequency (%)
Madukani508
 
0.9%
Shuleni506
 
0.9%
Majengo502
 
0.8%
Kati373
 
0.6%
Mtakuja262
 
0.4%
Sokoni232
 
0.4%
M187
 
0.3%
Muungano172
 
0.3%
Mbuyuni164
 
0.3%
Mlimani152
 
0.3%
Other values (19277)55971
94.2%
(Missing)371
 
0.6%

Length

2021-09-16T01:53:38.207379image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
a2387
 
3.4%
b2043
 
2.9%
kati1902
 
2.7%
majengo610
 
0.9%
wa600
 
0.8%
shuleni593
 
0.8%
madukani569
 
0.8%
mtaa514
 
0.7%
juu403
 
0.6%
mjini378
 
0.5%
Other values (17024)60795
85.9%

Most occurring characters

ValueCountFrequency (%)
a72003
15.4%
i45666
 
9.8%
n33499
 
7.2%
u26424
 
5.7%
e25671
 
5.5%
o23556
 
5.1%
M20431
 
4.4%
g18951
 
4.1%
l16372
 
3.5%
m15053
 
3.2%
Other values (63)168561
36.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter381263
81.8%
Uppercase Letter71291
 
15.3%
Space Separator11766
 
2.5%
Other Punctuation1184
 
0.3%
Decimal Number589
 
0.1%
Modifier Symbol45
 
< 0.1%
Dash Punctuation36
 
< 0.1%
Open Punctuation5
 
< 0.1%
Close Punctuation5
 
< 0.1%
Connector Punctuation3
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a72003
18.9%
i45666
12.0%
n33499
 
8.8%
u26424
 
6.9%
e25671
 
6.7%
o23556
 
6.2%
g18951
 
5.0%
l16372
 
4.3%
m15053
 
3.9%
b11843
 
3.1%
Other values (16)92225
24.2%
Uppercase Letter
ValueCountFrequency (%)
M20431
28.7%
K12545
17.6%
N6068
 
8.5%
B5112
 
7.2%
I4503
 
6.3%
S4039
 
5.7%
A3076
 
4.3%
C2533
 
3.6%
L2458
 
3.4%
U1704
 
2.4%
Other values (15)8822
12.4%
Decimal Number
ValueCountFrequency (%)
1242
41.1%
270
 
11.9%
350
 
8.5%
449
 
8.3%
633
 
5.6%
832
 
5.4%
932
 
5.4%
030
 
5.1%
529
 
4.9%
722
 
3.7%
Other Punctuation
ValueCountFrequency (%)
'1017
85.9%
/136
 
11.5%
.29
 
2.4%
#2
 
0.2%
Open Punctuation
ValueCountFrequency (%)
(4
80.0%
[1
 
20.0%
Close Punctuation
ValueCountFrequency (%)
)4
80.0%
]1
 
20.0%
Space Separator
ValueCountFrequency (%)
11766
100.0%
Modifier Symbol
ValueCountFrequency (%)
`45
100.0%
Dash Punctuation
ValueCountFrequency (%)
-36
100.0%
Connector Punctuation
ValueCountFrequency (%)
_3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin452554
97.1%
Common13633
 
2.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a72003
15.9%
i45666
 
10.1%
n33499
 
7.4%
u26424
 
5.8%
e25671
 
5.7%
o23556
 
5.2%
M20431
 
4.5%
g18951
 
4.2%
l16372
 
3.6%
m15053
 
3.3%
Other values (41)154928
34.2%
Common
ValueCountFrequency (%)
11766
86.3%
'1017
 
7.5%
1242
 
1.8%
/136
 
1.0%
270
 
0.5%
350
 
0.4%
449
 
0.4%
`45
 
0.3%
-36
 
0.3%
633
 
0.2%
Other values (12)189
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII466187
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a72003
15.4%
i45666
 
9.8%
n33499
 
7.2%
u26424
 
5.7%
e25671
 
5.5%
o23556
 
5.1%
M20431
 
4.4%
g18951
 
4.1%
l16372
 
3.5%
m15053
 
3.2%
Other values (63)168561
36.2%

region
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct21
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
Iringa
5294 
Shinyanga
4982 
Mbeya
4639 
Kilimanjaro
4379 
Morogoro
4006 
Other values (16)
36100 

Length

Max length13
Median length6
Mean length6.623754209
Min length4

Characters and Unicode

Total characters393451
Distinct characters32
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowIringa
2nd rowMara
3rd rowManyara
4th rowMtwara
5th rowKagera

Common Values

ValueCountFrequency (%)
Iringa5294
 
8.9%
Shinyanga4982
 
8.4%
Mbeya4639
 
7.8%
Kilimanjaro4379
 
7.4%
Morogoro4006
 
6.7%
Arusha3350
 
5.6%
Kagera3316
 
5.6%
Mwanza3102
 
5.2%
Kigoma2816
 
4.7%
Ruvuma2640
 
4.4%
Other values (11)20876
35.1%

Length

2021-09-16T01:53:38.426138image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
iringa5294
 
8.7%
shinyanga4982
 
8.2%
mbeya4639
 
7.6%
kilimanjaro4379
 
7.2%
morogoro4006
 
6.6%
arusha3350
 
5.5%
kagera3316
 
5.4%
mwanza3102
 
5.1%
kigoma2816
 
4.6%
ruvuma2640
 
4.3%
Other values (13)22486
36.9%

Most occurring characters

ValueCountFrequency (%)
a83413
21.2%
n33143
 
8.4%
r32397
 
8.2%
i31763
 
8.1%
o29580
 
7.5%
g25054
 
6.4%
M17029
 
4.3%
m12841
 
3.3%
y11204
 
2.8%
K10511
 
2.7%
Other values (22)106516
27.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter331636
84.3%
Uppercase Letter60205
 
15.3%
Space Separator1610
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a83413
25.2%
n33143
 
10.0%
r32397
 
9.8%
i31763
 
9.6%
o29580
 
8.9%
g25054
 
7.6%
m12841
 
3.9%
y11204
 
3.4%
u10438
 
3.1%
w9275
 
2.8%
Other values (11)52528
15.8%
Uppercase Letter
ValueCountFrequency (%)
M17029
28.3%
K10511
17.5%
S7880
13.1%
I5294
 
8.8%
T4506
 
7.5%
R4448
 
7.4%
A3350
 
5.6%
D3006
 
5.0%
P2635
 
4.4%
L1546
 
2.6%
Space Separator
ValueCountFrequency (%)
1610
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin391841
99.6%
Common1610
 
0.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
a83413
21.3%
n33143
 
8.5%
r32397
 
8.3%
i31763
 
8.1%
o29580
 
7.5%
g25054
 
6.4%
M17029
 
4.3%
m12841
 
3.3%
y11204
 
2.9%
K10511
 
2.7%
Other values (21)104906
26.8%
Common
ValueCountFrequency (%)
1610
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII393451
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a83413
21.2%
n33143
 
8.4%
r32397
 
8.2%
i31763
 
8.1%
o29580
 
7.5%
g25054
 
6.4%
M17029
 
4.3%
m12841
 
3.3%
y11204
 
2.8%
K10511
 
2.7%
Other values (22)106516
27.1%

region_code
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct27
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.29700337
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size464.2 KiB
2021-09-16T01:53:38.536732image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q15
median12
Q317
95-th percentile60
Maximum99
Range98
Interquartile range (IQR)12

Descriptive statistics

Standard deviation17.58740634
Coefficient of variation (CV)1.149728866
Kurtosis10.28843341
Mean15.29700337
Median Absolute Deviation (MAD)6
Skewness3.17381811
Sum908642
Variance309.3168617
MonotonicityNot monotonic
2021-09-16T01:53:38.630481image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
115300
 
8.9%
175011
 
8.4%
124639
 
7.8%
34379
 
7.4%
54040
 
6.8%
183324
 
5.6%
193047
 
5.1%
23024
 
5.1%
162816
 
4.7%
102640
 
4.4%
Other values (17)21180
35.7%
ValueCountFrequency (%)
12201
3.7%
23024
5.1%
34379
7.4%
42513
4.2%
54040
6.8%
61609
 
2.7%
7805
 
1.4%
8300
 
0.5%
9390
 
0.7%
102640
4.4%
ValueCountFrequency (%)
99423
 
0.7%
90917
 
1.5%
801238
 
2.1%
601025
 
1.7%
401
 
< 0.1%
24326
 
0.5%
211583
2.7%
201969
3.3%
193047
5.1%
183324
5.6%

district_code
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.629747475
Minimum0
Maximum80
Zeros23
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size464.2 KiB
2021-09-16T01:53:38.739860image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q12
median3
Q35
95-th percentile30
Maximum80
Range80
Interquartile range (IQR)3

Descriptive statistics

Standard deviation9.633648629
Coefficient of variation (CV)1.711204396
Kurtosis16.21428363
Mean5.629747475
Median Absolute Deviation (MAD)1
Skewness3.962045299
Sum334407
Variance92.80718592
MonotonicityNot monotonic
2021-09-16T01:53:38.833610image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
112203
20.5%
211173
18.8%
39998
16.8%
48999
15.1%
54356
 
7.3%
64074
 
6.9%
73343
 
5.6%
81043
 
1.8%
30995
 
1.7%
33874
 
1.5%
Other values (10)2342
 
3.9%
ValueCountFrequency (%)
023
 
< 0.1%
112203
20.5%
211173
18.8%
39998
16.8%
48999
15.1%
54356
 
7.3%
64074
 
6.9%
73343
 
5.6%
81043
 
1.8%
13391
 
0.7%
ValueCountFrequency (%)
8012
 
< 0.1%
676
 
< 0.1%
63195
 
0.3%
62109
 
0.2%
6063
 
0.1%
53745
1.3%
43505
0.9%
33874
1.5%
30995
1.7%
23293
 
0.5%

lga
Categorical

HIGH CARDINALITY

Distinct125
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
Njombe
 
2503
Arusha Rural
 
1252
Moshi Rural
 
1251
Bariadi
 
1177
Rungwe
 
1106
Other values (120)
52111 

Length

Max length16
Median length6
Mean length7.416885522
Min length3

Characters and Unicode

Total characters440563
Distinct characters41
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowLudewa
2nd rowSerengeti
3rd rowSimanjiro
4th rowNanyumbu
5th rowKaragwe

Common Values

ValueCountFrequency (%)
Njombe2503
 
4.2%
Arusha Rural1252
 
2.1%
Moshi Rural1251
 
2.1%
Bariadi1177
 
2.0%
Rungwe1106
 
1.9%
Kilosa1094
 
1.8%
Kasulu1047
 
1.8%
Mbozi1034
 
1.7%
Meru1009
 
1.7%
Bagamoyo997
 
1.7%
Other values (115)46930
79.0%

Length

2021-09-16T01:53:39.099231image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
rural9552
 
13.5%
njombe2503
 
3.5%
urban1683
 
2.4%
moshi1330
 
1.9%
arusha1315
 
1.9%
bariadi1177
 
1.7%
singida1172
 
1.7%
rungwe1106
 
1.6%
kilosa1094
 
1.5%
kasulu1047
 
1.5%
Other values (106)48656
68.9%

Most occurring characters

ValueCountFrequency (%)
a69982
15.9%
o30079
 
6.8%
i29483
 
6.7%
u28324
 
6.4%
r26886
 
6.1%
e22579
 
5.1%
n22521
 
5.1%
l19238
 
4.4%
g18385
 
4.2%
M16017
 
3.6%
Other values (31)157069
35.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter358693
81.4%
Uppercase Letter70635
 
16.0%
Space Separator11235
 
2.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a69982
19.5%
o30079
 
8.4%
i29483
 
8.2%
u28324
 
7.9%
r26886
 
7.5%
e22579
 
6.3%
n22521
 
6.3%
l19238
 
5.4%
g18385
 
5.1%
m15622
 
4.4%
Other values (14)75594
21.1%
Uppercase Letter
ValueCountFrequency (%)
M16017
22.7%
R12207
17.3%
K11663
16.5%
S6261
 
8.9%
N5760
 
8.2%
B4839
 
6.9%
U3410
 
4.8%
I2480
 
3.5%
L2131
 
3.0%
T1367
 
1.9%
Other values (6)4500
 
6.4%
Space Separator
ValueCountFrequency (%)
11235
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin429328
97.4%
Common11235
 
2.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
a69982
16.3%
o30079
 
7.0%
i29483
 
6.9%
u28324
 
6.6%
r26886
 
6.3%
e22579
 
5.3%
n22521
 
5.2%
l19238
 
4.5%
g18385
 
4.3%
M16017
 
3.7%
Other values (30)145834
34.0%
Common
ValueCountFrequency (%)
11235
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII440563
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a69982
15.9%
o30079
 
6.8%
i29483
 
6.7%
u28324
 
6.4%
r26886
 
6.1%
e22579
 
5.1%
n22521
 
5.1%
l19238
 
4.4%
g18385
 
4.2%
M16017
 
3.6%
Other values (31)157069
35.7%

ward
Categorical

HIGH CARDINALITY

Distinct2092
Distinct (%)3.5%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
Igosi
 
307
Imalinyi
 
252
Siha Kati
 
232
Mdandu
 
231
Nduruma
 
217
Other values (2087)
58161 

Length

Max length23
Median length7
Mean length7.505841751
Min length3

Characters and Unicode

Total characters445847
Distinct characters54
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique30 ?
Unique (%)0.1%

Sample

1st rowMundindi
2nd rowNatta
3rd rowNgorika
4th rowNanyumbu
5th rowNyakasimbi

Common Values

ValueCountFrequency (%)
Igosi307
 
0.5%
Imalinyi252
 
0.4%
Siha Kati232
 
0.4%
Mdandu231
 
0.4%
Nduruma217
 
0.4%
Mishamo203
 
0.3%
Kitunda203
 
0.3%
Msindo201
 
0.3%
Chalinze196
 
0.3%
Maji ya Chai190
 
0.3%
Other values (2082)57168
96.2%

Length

2021-09-16T01:53:39.380522image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
mashariki580
 
0.9%
urban540
 
0.8%
siha434
 
0.7%
kusini393
 
0.6%
magharibi362
 
0.6%
igosi307
 
0.5%
masama303
 
0.5%
machame293
 
0.5%
kati270
 
0.4%
imalinyi252
 
0.4%
Other values (2106)61033
94.2%

Most occurring characters

ValueCountFrequency (%)
a69533
15.6%
i40243
 
9.0%
n29584
 
6.6%
u27015
 
6.1%
o26093
 
5.9%
e23589
 
5.3%
g21166
 
4.7%
M18916
 
4.2%
m16216
 
3.6%
l15799
 
3.5%
Other values (44)157693
35.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter374730
84.0%
Uppercase Letter64523
 
14.5%
Space Separator5408
 
1.2%
Other Punctuation1163
 
0.3%
Dash Punctuation23
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M18916
29.3%
K11212
17.4%
I6094
 
9.4%
N5919
 
9.2%
S3354
 
5.2%
L3162
 
4.9%
B3098
 
4.8%
U2913
 
4.5%
C2123
 
3.3%
R1692
 
2.6%
Other values (15)6040
 
9.4%
Lowercase Letter
ValueCountFrequency (%)
a69533
18.6%
i40243
10.7%
n29584
 
7.9%
u27015
 
7.2%
o26093
 
7.0%
e23589
 
6.3%
g21166
 
5.6%
m16216
 
4.3%
l15799
 
4.2%
r13057
 
3.5%
Other values (15)92435
24.7%
Other Punctuation
ValueCountFrequency (%)
'1013
87.1%
/150
 
12.9%
Space Separator
ValueCountFrequency (%)
5408
100.0%
Dash Punctuation
ValueCountFrequency (%)
-23
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin439253
98.5%
Common6594
 
1.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
a69533
15.8%
i40243
 
9.2%
n29584
 
6.7%
u27015
 
6.2%
o26093
 
5.9%
e23589
 
5.4%
g21166
 
4.8%
M18916
 
4.3%
m16216
 
3.7%
l15799
 
3.6%
Other values (40)151099
34.4%
Common
ValueCountFrequency (%)
5408
82.0%
'1013
 
15.4%
/150
 
2.3%
-23
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII445847
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a69533
15.6%
i40243
 
9.0%
n29584
 
6.6%
u27015
 
6.1%
o26093
 
5.9%
e23589
 
5.3%
g21166
 
4.7%
M18916
 
4.2%
m16216
 
3.6%
l15799
 
3.5%
Other values (44)157693
35.4%

population
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct1049
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean179.9099832
Minimum0
Maximum30500
Zeros21381
Zeros (%)36.0%
Negative0
Negative (%)0.0%
Memory size464.2 KiB
2021-09-16T01:53:39.505523image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median25
Q3215
95-th percentile680
Maximum30500
Range30500
Interquartile range (IQR)215

Descriptive statistics

Standard deviation471.4821757
Coefficient of variation (CV)2.620655994
Kurtosis402.2801153
Mean179.9099832
Median Absolute Deviation (MAD)25
Skewness12.66071359
Sum10686653
Variance222295.442
MonotonicityNot monotonic
2021-09-16T01:53:39.614896image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
021381
36.0%
17025
 
11.8%
2001940
 
3.3%
1501892
 
3.2%
2501681
 
2.8%
3001476
 
2.5%
1001146
 
1.9%
501139
 
1.9%
5001009
 
1.7%
350986
 
1.7%
Other values (1039)19725
33.2%
ValueCountFrequency (%)
021381
36.0%
17025
 
11.8%
24
 
< 0.1%
34
 
< 0.1%
413
 
< 0.1%
544
 
0.1%
619
 
< 0.1%
73
 
< 0.1%
823
 
< 0.1%
911
 
< 0.1%
ValueCountFrequency (%)
305001
 
< 0.1%
153001
 
< 0.1%
114631
 
< 0.1%
100003
< 0.1%
98651
 
< 0.1%
95001
 
< 0.1%
90003
< 0.1%
88481
 
< 0.1%
86001
 
< 0.1%
85001
 
< 0.1%

public_meeting
Boolean

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing3334
Missing (%)5.6%
Memory size464.2 KiB
True
51011 
False
 
5055
(Missing)
 
3334
ValueCountFrequency (%)
True51011
85.9%
False5055
 
8.5%
(Missing)3334
 
5.6%
2021-09-16T01:53:39.693022image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

recorded_by
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
GeoData Consultants Ltd
59400 

Length

Max length23
Median length23
Mean length23
Min length23

Characters and Unicode

Total characters1366200
Distinct characters14
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGeoData Consultants Ltd
2nd rowGeoData Consultants Ltd
3rd rowGeoData Consultants Ltd
4th rowGeoData Consultants Ltd
5th rowGeoData Consultants Ltd

Common Values

ValueCountFrequency (%)
GeoData Consultants Ltd59400
100.0%

Length

2021-09-16T01:53:39.864861image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T01:53:39.927366image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
consultants59400
33.3%
ltd59400
33.3%
geodata59400
33.3%

Most occurring characters

ValueCountFrequency (%)
t237600
17.4%
a178200
13.0%
o118800
8.7%
118800
8.7%
n118800
8.7%
s118800
8.7%
G59400
 
4.3%
e59400
 
4.3%
D59400
 
4.3%
C59400
 
4.3%
Other values (4)237600
17.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1009800
73.9%
Uppercase Letter237600
 
17.4%
Space Separator118800
 
8.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t237600
23.5%
a178200
17.6%
o118800
11.8%
n118800
11.8%
s118800
11.8%
e59400
 
5.9%
u59400
 
5.9%
l59400
 
5.9%
d59400
 
5.9%
Uppercase Letter
ValueCountFrequency (%)
G59400
25.0%
D59400
25.0%
C59400
25.0%
L59400
25.0%
Space Separator
ValueCountFrequency (%)
118800
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1247400
91.3%
Common118800
 
8.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
t237600
19.0%
a178200
14.3%
o118800
9.5%
n118800
9.5%
s118800
9.5%
G59400
 
4.8%
e59400
 
4.8%
D59400
 
4.8%
C59400
 
4.8%
u59400
 
4.8%
Other values (3)178200
14.3%
Common
ValueCountFrequency (%)
118800
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1366200
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t237600
17.4%
a178200
13.0%
o118800
8.7%
118800
8.7%
n118800
8.7%
s118800
8.7%
G59400
 
4.3%
e59400
 
4.3%
D59400
 
4.3%
C59400
 
4.3%
Other values (4)237600
17.4%

scheme_management
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct12
Distinct (%)< 0.1%
Missing3877
Missing (%)6.5%
Memory size464.2 KiB
VWC
36793 
WUG
5206 
Water authority
 
3153
WUA
 
2883
Water Board
 
2748
Other values (7)
4740 

Length

Max length16
Median length3
Mean length4.644723808
Min length3

Characters and Unicode

Total characters257889
Distinct characters29
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowVWC
2nd rowOther
3rd rowVWC
4th rowVWC
5th rowVWC

Common Values

ValueCountFrequency (%)
VWC36793
61.9%
WUG5206
 
8.8%
Water authority3153
 
5.3%
WUA2883
 
4.9%
Water Board2748
 
4.6%
Parastatal1680
 
2.8%
Private operator1063
 
1.8%
Company1061
 
1.8%
Other766
 
1.3%
SWC97
 
0.2%
Other values (2)73
 
0.1%
(Missing)3877
 
6.5%

Length

2021-09-16T01:53:40.099236image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
vwc36793
58.9%
water5901
 
9.4%
wug5206
 
8.3%
authority3153
 
5.0%
wua2883
 
4.6%
board2748
 
4.4%
parastatal1680
 
2.7%
operator1063
 
1.7%
private1063
 
1.7%
company1061
 
1.7%
Other values (4)936
 
1.5%

Most occurring characters

ValueCountFrequency (%)
W50880
19.7%
C37951
14.7%
V36793
14.3%
a21709
8.4%
t18531
 
7.2%
r17509
 
6.8%
o9089
 
3.5%
e8794
 
3.4%
U8089
 
3.1%
6964
 
2.7%
Other values (19)41580
16.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter148229
57.5%
Lowercase Letter102696
39.8%
Space Separator6964
 
2.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a21709
21.1%
t18531
18.0%
r17509
17.0%
o9089
8.9%
e8794
8.6%
i4216
 
4.1%
y4214
 
4.1%
h3919
 
3.8%
u3225
 
3.1%
d2748
 
2.7%
Other values (6)8742
8.5%
Uppercase Letter
ValueCountFrequency (%)
W50880
34.3%
C37951
25.6%
V36793
24.8%
U8089
 
5.5%
G5206
 
3.5%
A2883
 
1.9%
B2748
 
1.9%
P2743
 
1.9%
O766
 
0.5%
S97
 
0.1%
Other values (2)73
 
< 0.1%
Space Separator
ValueCountFrequency (%)
6964
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin250925
97.3%
Common6964
 
2.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
W50880
20.3%
C37951
15.1%
V36793
14.7%
a21709
8.7%
t18531
 
7.4%
r17509
 
7.0%
o9089
 
3.6%
e8794
 
3.5%
U8089
 
3.2%
G5206
 
2.1%
Other values (18)36374
14.5%
Common
ValueCountFrequency (%)
6964
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII257889
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
W50880
19.7%
C37951
14.7%
V36793
14.3%
a21709
8.4%
t18531
 
7.2%
r17509
 
6.8%
o9089
 
3.5%
e8794
 
3.4%
U8089
 
3.1%
6964
 
2.7%
Other values (19)41580
16.1%

scheme_name
Categorical

HIGH CARDINALITY
MISSING

Distinct2696
Distinct (%)8.6%
Missing28166
Missing (%)47.4%
Memory size464.2 KiB
K
 
682
None
 
644
Borehole
 
546
Chalinze wate
 
405
M
 
400
Other values (2691)
28557 

Length

Max length46
Median length13
Mean length14.30521227
Min length1

Characters and Unicode

Total characters446809
Distinct characters68
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique712 ?
Unique (%)2.3%

Sample

1st rowRoman
2nd rowNyumba ya mungu pipe scheme
3rd rowZingibali
4th rowBL Bondeni
5th rowNone

Common Values

ValueCountFrequency (%)
K682
 
1.1%
None644
 
1.1%
Borehole546
 
0.9%
Chalinze wate405
 
0.7%
M400
 
0.7%
DANIDA379
 
0.6%
Government320
 
0.5%
Ngana water supplied scheme270
 
0.5%
wanging'ombe water supply s261
 
0.4%
wanging'ombe supply scheme234
 
0.4%
Other values (2686)27093
45.6%
(Missing)28166
47.4%

Length

2021-09-16T01:53:40.349236image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
water9770
 
13.6%
supply6745
 
9.4%
scheme2532
 
3.5%
wa2157
 
3.0%
gravity1914
 
2.7%
pipe1346
 
1.9%
maji1343
 
1.9%
mradi1097
 
1.5%
line1016
 
1.4%
supplied877
 
1.2%
Other values (2506)43219
60.0%

Most occurring characters

ValueCountFrequency (%)
a48584
 
10.9%
41252
 
9.2%
e35239
 
7.9%
i26411
 
5.9%
p22451
 
5.0%
r21816
 
4.9%
t19216
 
4.3%
u18441
 
4.1%
n17760
 
4.0%
o17418
 
3.9%
Other values (58)178221
39.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter353183
79.0%
Uppercase Letter50064
 
11.2%
Space Separator41252
 
9.2%
Other Punctuation1317
 
0.3%
Dash Punctuation554
 
0.1%
Open Punctuation191
 
< 0.1%
Decimal Number147
 
< 0.1%
Modifier Symbol70
 
< 0.1%
Close Punctuation31
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a48584
13.8%
e35239
 
10.0%
i26411
 
7.5%
p22451
 
6.4%
r21816
 
6.2%
t19216
 
5.4%
u18441
 
5.2%
n17760
 
5.0%
o17418
 
4.9%
l17308
 
4.9%
Other values (16)108539
30.7%
Uppercase Letter
ValueCountFrequency (%)
M9314
18.6%
K5600
11.2%
N4439
 
8.9%
S3770
 
7.5%
A2729
 
5.5%
I2691
 
5.4%
W2531
 
5.1%
B2387
 
4.8%
L2107
 
4.2%
U1790
 
3.6%
Other values (15)12706
25.4%
Decimal Number
ValueCountFrequency (%)
261
41.5%
355
37.4%
77
 
4.8%
17
 
4.8%
47
 
4.8%
54
 
2.7%
03
 
2.0%
63
 
2.0%
Other Punctuation
ValueCountFrequency (%)
'938
71.2%
/370
 
28.1%
&8
 
0.6%
:1
 
0.1%
Space Separator
ValueCountFrequency (%)
41252
100.0%
Dash Punctuation
ValueCountFrequency (%)
-554
100.0%
Open Punctuation
ValueCountFrequency (%)
(191
100.0%
Close Punctuation
ValueCountFrequency (%)
)31
100.0%
Modifier Symbol
ValueCountFrequency (%)
`70
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin403247
90.3%
Common43562
 
9.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
a48584
 
12.0%
e35239
 
8.7%
i26411
 
6.5%
p22451
 
5.6%
r21816
 
5.4%
t19216
 
4.8%
u18441
 
4.6%
n17760
 
4.4%
o17418
 
4.3%
l17308
 
4.3%
Other values (41)158603
39.3%
Common
ValueCountFrequency (%)
41252
94.7%
'938
 
2.2%
-554
 
1.3%
/370
 
0.8%
(191
 
0.4%
`70
 
0.2%
261
 
0.1%
355
 
0.1%
)31
 
0.1%
&8
 
< 0.1%
Other values (7)32
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII446809
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a48584
 
10.9%
41252
 
9.2%
e35239
 
7.9%
i26411
 
5.9%
p22451
 
5.0%
r21816
 
4.9%
t19216
 
4.3%
u18441
 
4.1%
n17760
 
4.0%
o17418
 
3.9%
Other values (58)178221
39.9%

permit
Boolean

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing3056
Missing (%)5.1%
Memory size464.2 KiB
True
38852 
False
17492 
(Missing)
 
3056
ValueCountFrequency (%)
True38852
65.4%
False17492
29.4%
(Missing)3056
 
5.1%
2021-09-16T01:53:40.442987image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

construction_year
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct55
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1300.652475
Minimum0
Maximum2013
Zeros20709
Zeros (%)34.9%
Negative0
Negative (%)0.0%
Memory size464.2 KiB
2021-09-16T01:53:40.521111image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1986
Q32004
95-th percentile2010
Maximum2013
Range2013
Interquartile range (IQR)2004

Descriptive statistics

Standard deviation951.6205473
Coefficient of variation (CV)0.7316485885
Kurtosis-1.596432369
Mean1300.652475
Median Absolute Deviation (MAD)22
Skewness-0.6349277866
Sum77258757
Variance905581.6661
MonotonicityNot monotonic
2021-09-16T01:53:40.630486image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
020709
34.9%
20102645
 
4.5%
20082613
 
4.4%
20092533
 
4.3%
20002091
 
3.5%
20071587
 
2.7%
20061471
 
2.5%
20031286
 
2.2%
20111256
 
2.1%
20041123
 
1.9%
Other values (45)22086
37.2%
ValueCountFrequency (%)
020709
34.9%
1960102
 
0.2%
196121
 
< 0.1%
196230
 
0.1%
196385
 
0.1%
196440
 
0.1%
196519
 
< 0.1%
196617
 
< 0.1%
196788
 
0.1%
196877
 
0.1%
ValueCountFrequency (%)
2013176
 
0.3%
20121084
1.8%
20111256
2.1%
20102645
4.5%
20092533
4.3%
20082613
4.4%
20071587
2.7%
20061471
2.5%
20051011
 
1.7%
20041123
1.9%

extraction_type
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
gravity
26780 
nira/tanira
8154 
other
6430 
submersible
4764 
swn 80
3670 
Other values (13)
9602 

Length

Max length25
Median length7
Mean length7.719511785
Min length3

Characters and Unicode

Total characters458539
Distinct characters29
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgravity
2nd rowgravity
3rd rowgravity
4th rowsubmersible
5th rowgravity

Common Values

ValueCountFrequency (%)
gravity26780
45.1%
nira/tanira8154
 
13.7%
other6430
 
10.8%
submersible4764
 
8.0%
swn 803670
 
6.2%
mono2865
 
4.8%
india mark ii2400
 
4.0%
afridev1770
 
3.0%
ksb1415
 
2.4%
other - rope pump451
 
0.8%
Other values (8)701
 
1.2%

Length

2021-09-16T01:53:41.115834image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
gravity26780
38.1%
nira/tanira8154
 
11.6%
other7197
 
10.2%
submersible4764
 
6.8%
swn3899
 
5.5%
803670
 
5.2%
mono2865
 
4.1%
mark2498
 
3.6%
india2498
 
3.6%
ii2400
 
3.4%
Other values (13)5640
 
8.0%

Most occurring characters

ValueCountFrequency (%)
i60078
13.1%
r59768
13.0%
a58179
12.7%
t42131
9.2%
v28550
 
6.2%
y26867
 
5.9%
g26782
 
5.8%
n25691
 
5.6%
e19036
 
4.2%
s14844
 
3.2%
Other values (19)96613
21.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter430853
94.0%
Space Separator10965
 
2.4%
Other Punctuation8156
 
1.8%
Decimal Number7798
 
1.7%
Dash Punctuation767
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i60078
13.9%
r59768
13.9%
a58179
13.5%
t42131
9.8%
v28550
6.6%
y26867
 
6.2%
g26782
 
6.2%
n25691
 
6.0%
e19036
 
4.4%
s14844
 
3.4%
Other values (13)68927
16.0%
Decimal Number
ValueCountFrequency (%)
83899
50.0%
03670
47.1%
1229
 
2.9%
Space Separator
ValueCountFrequency (%)
10965
100.0%
Other Punctuation
ValueCountFrequency (%)
/8156
100.0%
Dash Punctuation
ValueCountFrequency (%)
-767
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin430853
94.0%
Common27686
 
6.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i60078
13.9%
r59768
13.9%
a58179
13.5%
t42131
9.8%
v28550
6.6%
y26867
 
6.2%
g26782
 
6.2%
n25691
 
6.0%
e19036
 
4.4%
s14844
 
3.4%
Other values (13)68927
16.0%
Common
ValueCountFrequency (%)
10965
39.6%
/8156
29.5%
83899
 
14.1%
03670
 
13.3%
-767
 
2.8%
1229
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII458539
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i60078
13.1%
r59768
13.0%
a58179
12.7%
t42131
9.2%
v28550
 
6.2%
y26867
 
5.9%
g26782
 
5.8%
n25691
 
5.6%
e19036
 
4.2%
s14844
 
3.2%
Other values (19)96613
21.1%

extraction_type_group
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct13
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
gravity
26780 
nira/tanira
8154 
other
6430 
submersible
6179 
swn 80
3670 
Other values (8)
8187 

Length

Max length15
Median length7
Mean length7.880538721
Min length4

Characters and Unicode

Total characters468104
Distinct characters26
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgravity
2nd rowgravity
3rd rowgravity
4th rowsubmersible
5th rowgravity

Common Values

ValueCountFrequency (%)
gravity26780
45.1%
nira/tanira8154
 
13.7%
other6430
 
10.8%
submersible6179
 
10.4%
swn 803670
 
6.2%
mono2865
 
4.8%
india mark ii2400
 
4.0%
afridev1770
 
3.0%
rope pump451
 
0.8%
other handpump364
 
0.6%
Other values (3)337
 
0.6%

Length

2021-09-16T01:53:41.320237image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
gravity26780
38.8%
nira/tanira8154
 
11.8%
other6916
 
10.0%
submersible6179
 
9.0%
803670
 
5.3%
swn3670
 
5.3%
mono2865
 
4.2%
india2498
 
3.6%
mark2498
 
3.6%
ii2400
 
3.5%
Other values (7)3373
 
4.9%

Most occurring characters

ValueCountFrequency (%)
i61244
13.1%
r61141
13.1%
a58372
12.5%
t41972
9.0%
v28550
 
6.1%
g26780
 
5.7%
y26780
 
5.7%
n25822
 
5.5%
e21729
 
4.6%
s16028
 
3.4%
Other values (16)99686
21.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter442890
94.6%
Space Separator9603
 
2.1%
Other Punctuation8154
 
1.7%
Decimal Number7340
 
1.6%
Dash Punctuation117
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i61244
13.8%
r61141
13.8%
a58372
13.2%
t41972
9.5%
v28550
 
6.4%
g26780
 
6.0%
y26780
 
6.0%
n25822
 
5.8%
e21729
 
4.9%
s16028
 
3.6%
Other values (11)74472
16.8%
Decimal Number
ValueCountFrequency (%)
83670
50.0%
03670
50.0%
Space Separator
ValueCountFrequency (%)
9603
100.0%
Other Punctuation
ValueCountFrequency (%)
/8154
100.0%
Dash Punctuation
ValueCountFrequency (%)
-117
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin442890
94.6%
Common25214
 
5.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
i61244
13.8%
r61141
13.8%
a58372
13.2%
t41972
9.5%
v28550
 
6.4%
g26780
 
6.0%
y26780
 
6.0%
n25822
 
5.8%
e21729
 
4.9%
s16028
 
3.6%
Other values (11)74472
16.8%
Common
ValueCountFrequency (%)
9603
38.1%
/8154
32.3%
83670
 
14.6%
03670
 
14.6%
-117
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII468104
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i61244
13.1%
r61141
13.1%
a58372
12.5%
t41972
9.0%
v28550
 
6.1%
g26780
 
5.7%
y26780
 
5.7%
n25822
 
5.5%
e21729
 
4.6%
s16028
 
3.4%
Other values (16)99686
21.3%

extraction_type_class
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
gravity
26780 
handpump
16456 
other
6430 
submersible
6179 
motorpump
2987 
Other values (2)
 
568

Length

Max length12
Median length7
Mean length7.602239057
Min length5

Characters and Unicode

Total characters451573
Distinct characters21
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgravity
2nd rowgravity
3rd rowgravity
4th rowsubmersible
5th rowgravity

Common Values

ValueCountFrequency (%)
gravity26780
45.1%
handpump16456
27.7%
other6430
 
10.8%
submersible6179
 
10.4%
motorpump2987
 
5.0%
rope pump451
 
0.8%
wind-powered117
 
0.2%

Length

2021-09-16T01:53:41.523362image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T01:53:41.601491image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
gravity26780
44.7%
handpump16456
27.5%
other6430
 
10.7%
submersible6179
 
10.3%
motorpump2987
 
5.0%
pump451
 
0.8%
rope451
 
0.8%
wind-powered117
 
0.2%

Most occurring characters

ValueCountFrequency (%)
a43236
 
9.6%
r42944
 
9.5%
p40356
 
8.9%
t36197
 
8.0%
i33076
 
7.3%
m29060
 
6.4%
g26780
 
5.9%
v26780
 
5.9%
y26780
 
5.9%
u26073
 
5.8%
Other values (11)120291
26.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter451005
99.9%
Space Separator451
 
0.1%
Dash Punctuation117
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a43236
 
9.6%
r42944
 
9.5%
p40356
 
8.9%
t36197
 
8.0%
i33076
 
7.3%
m29060
 
6.4%
g26780
 
5.9%
v26780
 
5.9%
y26780
 
5.9%
u26073
 
5.8%
Other values (9)119723
26.5%
Dash Punctuation
ValueCountFrequency (%)
-117
100.0%
Space Separator
ValueCountFrequency (%)
451
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin451005
99.9%
Common568
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a43236
 
9.6%
r42944
 
9.5%
p40356
 
8.9%
t36197
 
8.0%
i33076
 
7.3%
m29060
 
6.4%
g26780
 
5.9%
v26780
 
5.9%
y26780
 
5.9%
u26073
 
5.8%
Other values (9)119723
26.5%
Common
ValueCountFrequency (%)
451
79.4%
-117
 
20.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII451573
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a43236
 
9.6%
r42944
 
9.5%
p40356
 
8.9%
t36197
 
8.0%
i33076
 
7.3%
m29060
 
6.4%
g26780
 
5.9%
v26780
 
5.9%
y26780
 
5.9%
u26073
 
5.8%
Other values (11)120291
26.6%

management
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
vwc
40507 
wug
6515 
water board
 
2933
wua
 
2535
private operator
 
1971
Other values (7)
4939 

Length

Max length16
Median length3
Mean length4.350639731
Min length3

Characters and Unicode

Total characters258428
Distinct characters23
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowvwc
2nd rowwug
3rd rowvwc
4th rowvwc
5th rowother

Common Values

ValueCountFrequency (%)
vwc40507
68.2%
wug6515
 
11.0%
water board2933
 
4.9%
wua2535
 
4.3%
private operator1971
 
3.3%
parastatal1768
 
3.0%
water authority904
 
1.5%
other844
 
1.4%
company685
 
1.2%
unknown561
 
0.9%
Other values (2)177
 
0.3%

Length

2021-09-16T01:53:41.835822image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
vwc40507
61.9%
wug6515
 
10.0%
water3837
 
5.9%
board2933
 
4.5%
wua2535
 
3.9%
operator1971
 
3.0%
private1971
 
3.0%
parastatal1768
 
2.7%
other943
 
1.4%
authority904
 
1.4%
Other values (5)1522
 
2.3%

Most occurring characters

ValueCountFrequency (%)
w53955
20.9%
v42478
16.4%
c41291
16.0%
a21908
8.5%
r16376
 
6.3%
t14222
 
5.5%
u10593
 
4.1%
o10166
 
3.9%
e8722
 
3.4%
g6515
 
2.5%
Other values (13)32202
12.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter252323
97.6%
Space Separator6006
 
2.3%
Dash Punctuation99
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
w53955
21.4%
v42478
16.8%
c41291
16.4%
a21908
8.7%
r16376
 
6.5%
t14222
 
5.6%
u10593
 
4.2%
o10166
 
4.0%
e8722
 
3.5%
g6515
 
2.6%
Other values (11)26097
10.3%
Space Separator
ValueCountFrequency (%)
6006
100.0%
Dash Punctuation
ValueCountFrequency (%)
-99
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin252323
97.6%
Common6105
 
2.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
w53955
21.4%
v42478
16.8%
c41291
16.4%
a21908
8.7%
r16376
 
6.5%
t14222
 
5.6%
u10593
 
4.2%
o10166
 
4.0%
e8722
 
3.5%
g6515
 
2.6%
Other values (11)26097
10.3%
Common
ValueCountFrequency (%)
6006
98.4%
-99
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII258428
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
w53955
20.9%
v42478
16.4%
c41291
16.0%
a21908
8.5%
r16376
 
6.3%
t14222
 
5.5%
u10593
 
4.1%
o10166
 
3.9%
e8722
 
3.4%
g6515
 
2.5%
Other values (13)32202
12.5%

management_group
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
user-group
52490 
commercial
 
3638
parastatal
 
1768
other
 
943
unknown
 
561

Length

Max length10
Median length10
Mean length9.892289562
Min length5

Characters and Unicode

Total characters587602
Distinct characters18
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowuser-group
2nd rowuser-group
3rd rowuser-group
4th rowuser-group
5th rowother

Common Values

ValueCountFrequency (%)
user-group52490
88.4%
commercial3638
 
6.1%
parastatal1768
 
3.0%
other943
 
1.6%
unknown561
 
0.9%

Length

2021-09-16T01:53:42.038952image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T01:53:42.111975image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
user-group52490
88.4%
commercial3638
 
6.1%
parastatal1768
 
3.0%
other943
 
1.6%
unknown561
 
0.9%

Most occurring characters

ValueCountFrequency (%)
r111329
18.9%
u105541
18.0%
o57632
9.8%
e57071
9.7%
s54258
9.2%
p54258
9.2%
-52490
8.9%
g52490
8.9%
a10710
 
1.8%
c7276
 
1.2%
Other values (8)24547
 
4.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter535112
91.1%
Dash Punctuation52490
 
8.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r111329
20.8%
u105541
19.7%
o57632
10.8%
e57071
10.7%
s54258
10.1%
p54258
10.1%
g52490
9.8%
a10710
 
2.0%
c7276
 
1.4%
m7276
 
1.4%
Other values (7)17271
 
3.2%
Dash Punctuation
ValueCountFrequency (%)
-52490
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin535112
91.1%
Common52490
 
8.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
r111329
20.8%
u105541
19.7%
o57632
10.8%
e57071
10.7%
s54258
10.1%
p54258
10.1%
g52490
9.8%
a10710
 
2.0%
c7276
 
1.4%
m7276
 
1.4%
Other values (7)17271
 
3.2%
Common
ValueCountFrequency (%)
-52490
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII587602
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r111329
18.9%
u105541
18.0%
o57632
9.8%
e57071
9.7%
s54258
9.2%
p54258
9.2%
-52490
8.9%
g52490
8.9%
a10710
 
1.8%
c7276
 
1.2%
Other values (8)24547
 
4.2%

payment
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
never pay
25348 
pay per bucket
8985 
pay monthly
8300 
unknown
8157 
pay when scheme fails
3914 
Other values (2)
4696 

Length

Max length21
Median length9
Mean length10.66479798
Min length5

Characters and Unicode

Total characters633489
Distinct characters21
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowpay annually
2nd rownever pay
3rd rowpay per bucket
4th rownever pay
5th rownever pay

Common Values

ValueCountFrequency (%)
never pay25348
42.7%
pay per bucket8985
 
15.1%
pay monthly8300
 
14.0%
unknown8157
 
13.7%
pay when scheme fails3914
 
6.6%
pay annually3642
 
6.1%
other1054
 
1.8%

Length

2021-09-16T01:53:42.331975image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T01:53:42.410102image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
pay50189
39.7%
never25348
20.1%
per8985
 
7.1%
bucket8985
 
7.1%
monthly8300
 
6.6%
unknown8157
 
6.5%
scheme3914
 
3.1%
fails3914
 
3.1%
when3914
 
3.1%
annually3642
 
2.9%

Most occurring characters

ValueCountFrequency (%)
e81462
12.9%
n69317
10.9%
67002
10.6%
y62131
9.8%
a61387
9.7%
p59174
9.3%
r35387
 
5.6%
v25348
 
4.0%
u20784
 
3.3%
l19498
 
3.1%
Other values (11)131999
20.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter566487
89.4%
Space Separator67002
 
10.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e81462
14.4%
n69317
12.2%
y62131
11.0%
a61387
10.8%
p59174
10.4%
r35387
 
6.2%
v25348
 
4.5%
u20784
 
3.7%
l19498
 
3.4%
t18339
 
3.2%
Other values (10)113660
20.1%
Space Separator
ValueCountFrequency (%)
67002
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin566487
89.4%
Common67002
 
10.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e81462
14.4%
n69317
12.2%
y62131
11.0%
a61387
10.8%
p59174
10.4%
r35387
 
6.2%
v25348
 
4.5%
u20784
 
3.7%
l19498
 
3.4%
t18339
 
3.2%
Other values (10)113660
20.1%
Common
ValueCountFrequency (%)
67002
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII633489
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e81462
12.9%
n69317
10.9%
67002
10.6%
y62131
9.8%
a61387
9.7%
p59174
9.3%
r35387
 
5.6%
v25348
 
4.0%
u20784
 
3.3%
l19498
 
3.1%
Other values (11)131999
20.8%

payment_type
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
never pay
25348 
per bucket
8985 
monthly
8300 
unknown
8157 
on failure
3914 
Other values (2)
4696 

Length

Max length10
Median length9
Mean length8.530757576
Min length5

Characters and Unicode

Total characters506727
Distinct characters20
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowannually
2nd rownever pay
3rd rowper bucket
4th rownever pay
5th rownever pay

Common Values

ValueCountFrequency (%)
never pay25348
42.7%
per bucket8985
 
15.1%
monthly8300
 
14.0%
unknown8157
 
13.7%
on failure3914
 
6.6%
annually3642
 
6.1%
other1054
 
1.8%

Length

2021-09-16T01:53:42.638931image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T01:53:42.718309image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
pay25348
26.0%
never25348
26.0%
per8985
 
9.2%
bucket8985
 
9.2%
monthly8300
 
8.5%
unknown8157
 
8.4%
failure3914
 
4.0%
on3914
 
4.0%
annually3642
 
3.7%
other1054
 
1.1%

Most occurring characters

ValueCountFrequency (%)
e73634
14.5%
n69317
13.7%
r39301
 
7.8%
38247
 
7.5%
y37290
 
7.4%
a36546
 
7.2%
p34333
 
6.8%
v25348
 
5.0%
u24698
 
4.9%
o21425
 
4.2%
Other values (10)106588
21.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter468480
92.5%
Space Separator38247
 
7.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e73634
15.7%
n69317
14.8%
r39301
8.4%
y37290
8.0%
a36546
7.8%
p34333
 
7.3%
v25348
 
5.4%
u24698
 
5.3%
o21425
 
4.6%
l19498
 
4.2%
Other values (9)87090
18.6%
Space Separator
ValueCountFrequency (%)
38247
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin468480
92.5%
Common38247
 
7.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e73634
15.7%
n69317
14.8%
r39301
8.4%
y37290
8.0%
a36546
7.8%
p34333
 
7.3%
v25348
 
5.4%
u24698
 
5.3%
o21425
 
4.6%
l19498
 
4.2%
Other values (9)87090
18.6%
Common
ValueCountFrequency (%)
38247
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII506727
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e73634
14.5%
n69317
13.7%
r39301
 
7.8%
38247
 
7.5%
y37290
 
7.4%
a36546
 
7.2%
p34333
 
6.8%
v25348
 
5.0%
u24698
 
4.9%
o21425
 
4.2%
Other values (10)106588
21.0%

water_quality
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
soft
50818 
salty
 
4856
unknown
 
1876
milky
 
804
coloured
 
490
Other values (3)
 
556

Length

Max length18
Median length4
Mean length4.303282828
Min length4

Characters and Unicode

Total characters255615
Distinct characters19
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowsoft
2nd rowsoft
3rd rowsoft
4th rowsoft
5th rowsoft

Common Values

ValueCountFrequency (%)
soft50818
85.6%
salty4856
 
8.2%
unknown1876
 
3.2%
milky804
 
1.4%
coloured490
 
0.8%
salty abandoned339
 
0.6%
fluoride200
 
0.3%
fluoride abandoned17
 
< 0.1%

Length

2021-09-16T01:53:42.952699image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T01:53:43.030825image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
soft50818
85.0%
salty5195
 
8.7%
unknown1876
 
3.1%
milky804
 
1.3%
coloured490
 
0.8%
abandoned356
 
0.6%
fluoride217
 
0.4%

Most occurring characters

ValueCountFrequency (%)
s56013
21.9%
t56013
21.9%
o54247
21.2%
f51035
20.0%
l6706
 
2.6%
n6340
 
2.5%
y5999
 
2.3%
a5907
 
2.3%
k2680
 
1.0%
u2583
 
1.0%
Other values (9)8092
 
3.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter255259
99.9%
Space Separator356
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s56013
21.9%
t56013
21.9%
o54247
21.3%
f51035
20.0%
l6706
 
2.6%
n6340
 
2.5%
y5999
 
2.4%
a5907
 
2.3%
k2680
 
1.0%
u2583
 
1.0%
Other values (8)7736
 
3.0%
Space Separator
ValueCountFrequency (%)
356
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin255259
99.9%
Common356
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
s56013
21.9%
t56013
21.9%
o54247
21.3%
f51035
20.0%
l6706
 
2.6%
n6340
 
2.5%
y5999
 
2.4%
a5907
 
2.3%
k2680
 
1.0%
u2583
 
1.0%
Other values (8)7736
 
3.0%
Common
ValueCountFrequency (%)
356
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII255615
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s56013
21.9%
t56013
21.9%
o54247
21.2%
f51035
20.0%
l6706
 
2.6%
n6340
 
2.5%
y5999
 
2.3%
a5907
 
2.3%
k2680
 
1.0%
u2583
 
1.0%
Other values (9)8092
 
3.2%

quality_group
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
good
50818 
salty
5195 
unknown
 
1876
milky
 
804
colored
 
490

Length

Max length8
Median length4
Mean length4.23510101
Min length4

Characters and Unicode

Total characters251565
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgood
2nd rowgood
3rd rowgood
4th rowgood
5th rowgood

Common Values

ValueCountFrequency (%)
good50818
85.6%
salty5195
 
8.7%
unknown1876
 
3.2%
milky804
 
1.4%
colored490
 
0.8%
fluoride217
 
0.4%

Length

2021-09-16T01:53:43.276755image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T01:53:43.354883image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
good50818
85.6%
salty5195
 
8.7%
unknown1876
 
3.2%
milky804
 
1.4%
colored490
 
0.8%
fluoride217
 
0.4%

Most occurring characters

ValueCountFrequency (%)
o104709
41.6%
d51525
20.5%
g50818
20.2%
l6706
 
2.7%
y5999
 
2.4%
n5628
 
2.2%
s5195
 
2.1%
a5195
 
2.1%
t5195
 
2.1%
k2680
 
1.1%
Other values (8)7915
 
3.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter251565
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o104709
41.6%
d51525
20.5%
g50818
20.2%
l6706
 
2.7%
y5999
 
2.4%
n5628
 
2.2%
s5195
 
2.1%
a5195
 
2.1%
t5195
 
2.1%
k2680
 
1.1%
Other values (8)7915
 
3.1%

Most occurring scripts

ValueCountFrequency (%)
Latin251565
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o104709
41.6%
d51525
20.5%
g50818
20.2%
l6706
 
2.7%
y5999
 
2.4%
n5628
 
2.2%
s5195
 
2.1%
a5195
 
2.1%
t5195
 
2.1%
k2680
 
1.1%
Other values (8)7915
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII251565
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o104709
41.6%
d51525
20.5%
g50818
20.2%
l6706
 
2.7%
y5999
 
2.4%
n5628
 
2.2%
s5195
 
2.1%
a5195
 
2.1%
t5195
 
2.1%
k2680
 
1.1%
Other values (8)7915
 
3.1%

quantity
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
enough
33186 
insufficient
15129 
dry
6246 
seasonal
4050 
unknown
 
789

Length

Max length12
Median length6
Mean length7.362373737
Min length3

Characters and Unicode

Total characters437325
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowenough
2nd rowinsufficient
3rd rowenough
4th rowdry
5th rowseasonal

Common Values

ValueCountFrequency (%)
enough33186
55.9%
insufficient15129
25.5%
dry6246
 
10.5%
seasonal4050
 
6.8%
unknown789
 
1.3%

Length

2021-09-16T01:53:43.574891image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T01:53:43.637394image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
enough33186
55.9%
insufficient15129
25.5%
dry6246
 
10.5%
seasonal4050
 
6.8%
unknown789
 
1.3%

Most occurring characters

ValueCountFrequency (%)
n69861
16.0%
e52365
12.0%
u49104
11.2%
i45387
10.4%
o38025
8.7%
g33186
7.6%
h33186
7.6%
f30258
6.9%
s23229
 
5.3%
c15129
 
3.5%
Other values (8)47595
10.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter437325
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n69861
16.0%
e52365
12.0%
u49104
11.2%
i45387
10.4%
o38025
8.7%
g33186
7.6%
h33186
7.6%
f30258
6.9%
s23229
 
5.3%
c15129
 
3.5%
Other values (8)47595
10.9%

Most occurring scripts

ValueCountFrequency (%)
Latin437325
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
n69861
16.0%
e52365
12.0%
u49104
11.2%
i45387
10.4%
o38025
8.7%
g33186
7.6%
h33186
7.6%
f30258
6.9%
s23229
 
5.3%
c15129
 
3.5%
Other values (8)47595
10.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII437325
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n69861
16.0%
e52365
12.0%
u49104
11.2%
i45387
10.4%
o38025
8.7%
g33186
7.6%
h33186
7.6%
f30258
6.9%
s23229
 
5.3%
c15129
 
3.5%
Other values (8)47595
10.9%

quantity_group
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
enough
33186 
insufficient
15129 
dry
6246 
seasonal
4050 
unknown
 
789

Length

Max length12
Median length6
Mean length7.362373737
Min length3

Characters and Unicode

Total characters437325
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowenough
2nd rowinsufficient
3rd rowenough
4th rowdry
5th rowseasonal

Common Values

ValueCountFrequency (%)
enough33186
55.9%
insufficient15129
25.5%
dry6246
 
10.5%
seasonal4050
 
6.8%
unknown789
 
1.3%

Length

2021-09-16T01:53:43.856139image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T01:53:43.913105image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
enough33186
55.9%
insufficient15129
25.5%
dry6246
 
10.5%
seasonal4050
 
6.8%
unknown789
 
1.3%

Most occurring characters

ValueCountFrequency (%)
n69861
16.0%
e52365
12.0%
u49104
11.2%
i45387
10.4%
o38025
8.7%
g33186
7.6%
h33186
7.6%
f30258
6.9%
s23229
 
5.3%
c15129
 
3.5%
Other values (8)47595
10.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter437325
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n69861
16.0%
e52365
12.0%
u49104
11.2%
i45387
10.4%
o38025
8.7%
g33186
7.6%
h33186
7.6%
f30258
6.9%
s23229
 
5.3%
c15129
 
3.5%
Other values (8)47595
10.9%

Most occurring scripts

ValueCountFrequency (%)
Latin437325
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
n69861
16.0%
e52365
12.0%
u49104
11.2%
i45387
10.4%
o38025
8.7%
g33186
7.6%
h33186
7.6%
f30258
6.9%
s23229
 
5.3%
c15129
 
3.5%
Other values (8)47595
10.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII437325
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n69861
16.0%
e52365
12.0%
u49104
11.2%
i45387
10.4%
o38025
8.7%
g33186
7.6%
h33186
7.6%
f30258
6.9%
s23229
 
5.3%
c15129
 
3.5%
Other values (8)47595
10.9%

source
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
spring
17021 
shallow well
16824 
machine dbh
11075 
river
9612 
rainwater harvesting
2295 
Other values (5)
2573 

Length

Max length20
Median length11
Mean length8.978804714
Min length3

Characters and Unicode

Total characters533341
Distinct characters21
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowspring
2nd rowrainwater harvesting
3rd rowdam
4th rowmachine dbh
5th rowrainwater harvesting

Common Values

ValueCountFrequency (%)
spring17021
28.7%
shallow well16824
28.3%
machine dbh11075
18.6%
river9612
16.2%
rainwater harvesting2295
 
3.9%
hand dtw874
 
1.5%
lake765
 
1.3%
dam656
 
1.1%
other212
 
0.4%
unknown66
 
0.1%

Length

2021-09-16T01:53:44.148789image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T01:53:44.226915image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
spring17021
18.8%
shallow16824
18.6%
well16824
18.6%
dbh11075
12.2%
machine11075
12.2%
river9612
10.6%
harvesting2295
 
2.5%
rainwater2295
 
2.5%
hand874
 
1.0%
dtw874
 
1.0%
Other values (4)1699
 
1.9%

Most occurring characters

ValueCountFrequency (%)
l68061
12.8%
r43342
 
8.1%
e43078
 
8.1%
h42355
 
7.9%
i42298
 
7.9%
a37079
 
7.0%
w36883
 
6.9%
s36140
 
6.8%
n33758
 
6.3%
31068
 
5.8%
Other values (11)119279
22.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter502273
94.2%
Space Separator31068
 
5.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l68061
13.6%
r43342
8.6%
e43078
8.6%
h42355
8.4%
i42298
8.4%
a37079
 
7.4%
w36883
 
7.3%
s36140
 
7.2%
n33758
 
6.7%
g19316
 
3.8%
Other values (10)99963
19.9%
Space Separator
ValueCountFrequency (%)
31068
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin502273
94.2%
Common31068
 
5.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
l68061
13.6%
r43342
8.6%
e43078
8.6%
h42355
8.4%
i42298
8.4%
a37079
 
7.4%
w36883
 
7.3%
s36140
 
7.2%
n33758
 
6.7%
g19316
 
3.8%
Other values (10)99963
19.9%
Common
ValueCountFrequency (%)
31068
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII533341
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
l68061
12.8%
r43342
 
8.1%
e43078
 
8.1%
h42355
 
7.9%
i42298
 
7.9%
a37079
 
7.0%
w36883
 
6.9%
s36140
 
6.8%
n33758
 
6.3%
31068
 
5.8%
Other values (11)119279
22.4%

source_type
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
spring
17021 
shallow well
16824 
borehole
11949 
river/lake
10377 
rainwater harvesting
2295 
Other values (2)
 
934

Length

Max length20
Median length8
Mean length9.303602694
Min length3

Characters and Unicode

Total characters552634
Distinct characters20
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowspring
2nd rowrainwater harvesting
3rd rowdam
4th rowborehole
5th rowrainwater harvesting

Common Values

ValueCountFrequency (%)
spring17021
28.7%
shallow well16824
28.3%
borehole11949
20.1%
river/lake10377
17.5%
rainwater harvesting2295
 
3.9%
dam656
 
1.1%
other278
 
0.5%

Length

2021-09-16T01:53:44.508163image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T01:53:44.570665image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
spring17021
21.7%
well16824
21.4%
shallow16824
21.4%
borehole11949
15.2%
river/lake10377
13.2%
rainwater2295
 
2.9%
harvesting2295
 
2.9%
dam656
 
0.8%
other278
 
0.4%

Most occurring characters

ValueCountFrequency (%)
l89622
16.2%
e66344
12.0%
r56887
10.3%
o41000
 
7.4%
s36140
 
6.5%
w35943
 
6.5%
a34742
 
6.3%
i31988
 
5.8%
h31346
 
5.7%
n21611
 
3.9%
Other values (10)107011
19.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter523138
94.7%
Space Separator19119
 
3.5%
Other Punctuation10377
 
1.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l89622
17.1%
e66344
12.7%
r56887
10.9%
o41000
7.8%
s36140
6.9%
w35943
6.9%
a34742
 
6.6%
i31988
 
6.1%
h31346
 
6.0%
n21611
 
4.1%
Other values (8)77515
14.8%
Space Separator
ValueCountFrequency (%)
19119
100.0%
Other Punctuation
ValueCountFrequency (%)
/10377
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin523138
94.7%
Common29496
 
5.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
l89622
17.1%
e66344
12.7%
r56887
10.9%
o41000
7.8%
s36140
6.9%
w35943
6.9%
a34742
 
6.6%
i31988
 
6.1%
h31346
 
6.0%
n21611
 
4.1%
Other values (8)77515
14.8%
Common
ValueCountFrequency (%)
19119
64.8%
/10377
35.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII552634
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
l89622
16.2%
e66344
12.0%
r56887
10.3%
o41000
 
7.4%
s36140
 
6.5%
w35943
 
6.5%
a34742
 
6.3%
i31988
 
5.8%
h31346
 
5.7%
n21611
 
3.9%
Other values (10)107011
19.4%

source_class
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
groundwater
45794 
surface
13328 
unknown
 
278

Length

Max length11
Median length11
Mean length10.08377104
Min length7

Characters and Unicode

Total characters598976
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgroundwater
2nd rowsurface
3rd rowsurface
4th rowgroundwater
5th rowsurface

Common Values

ValueCountFrequency (%)
groundwater45794
77.1%
surface13328
 
22.4%
unknown278
 
0.5%

Length

2021-09-16T01:53:44.805037image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T01:53:44.883167image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
groundwater45794
77.1%
surface13328
 
22.4%
unknown278
 
0.5%

Most occurring characters

ValueCountFrequency (%)
r104916
17.5%
u59400
9.9%
a59122
9.9%
e59122
9.9%
n46628
7.8%
o46072
7.7%
w46072
7.7%
g45794
7.6%
d45794
7.6%
t45794
7.6%
Other values (4)40262
 
6.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter598976
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r104916
17.5%
u59400
9.9%
a59122
9.9%
e59122
9.9%
n46628
7.8%
o46072
7.7%
w46072
7.7%
g45794
7.6%
d45794
7.6%
t45794
7.6%
Other values (4)40262
 
6.7%

Most occurring scripts

ValueCountFrequency (%)
Latin598976
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
r104916
17.5%
u59400
9.9%
a59122
9.9%
e59122
9.9%
n46628
7.8%
o46072
7.7%
w46072
7.7%
g45794
7.6%
d45794
7.6%
t45794
7.6%
Other values (4)40262
 
6.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII598976
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r104916
17.5%
u59400
9.9%
a59122
9.9%
e59122
9.9%
n46628
7.8%
o46072
7.7%
w46072
7.7%
g45794
7.6%
d45794
7.6%
t45794
7.6%
Other values (4)40262
 
6.7%

waterpoint_type
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
communal standpipe
28522 
hand pump
17488 
other
6380 
communal standpipe multiple
6103 
improved spring
 
784
Other values (2)
 
123

Length

Max length27
Median length18
Mean length14.82757576
Min length3

Characters and Unicode

Total characters880758
Distinct characters18
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowcommunal standpipe
2nd rowcommunal standpipe
3rd rowcommunal standpipe multiple
4th rowcommunal standpipe multiple
5th rowcommunal standpipe

Common Values

ValueCountFrequency (%)
communal standpipe28522
48.0%
hand pump17488
29.4%
other6380
 
10.7%
communal standpipe multiple6103
 
10.3%
improved spring784
 
1.3%
cattle trough116
 
0.2%
dam7
 
< 0.1%

Length

2021-09-16T01:53:45.070658image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T01:53:45.148785image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
communal34625
29.2%
standpipe34625
29.2%
pump17488
14.8%
hand17488
14.8%
other6380
 
5.4%
multiple6103
 
5.1%
spring784
 
0.7%
improved784
 
0.7%
trough116
 
0.1%
cattle116
 
0.1%

Most occurring characters

ValueCountFrequency (%)
p111897
12.7%
m93632
10.6%
n87522
9.9%
a86861
9.9%
59116
 
6.7%
u58332
 
6.6%
d52904
 
6.0%
e48008
 
5.5%
t47456
 
5.4%
l46947
 
5.3%
Other values (8)188083
21.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter821642
93.3%
Space Separator59116
 
6.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
p111897
13.6%
m93632
11.4%
n87522
10.7%
a86861
10.6%
u58332
7.1%
d52904
 
6.4%
e48008
 
5.8%
t47456
 
5.8%
l46947
 
5.7%
i42296
 
5.1%
Other values (7)145787
17.7%
Space Separator
ValueCountFrequency (%)
59116
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin821642
93.3%
Common59116
 
6.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
p111897
13.6%
m93632
11.4%
n87522
10.7%
a86861
10.6%
u58332
7.1%
d52904
 
6.4%
e48008
 
5.8%
t47456
 
5.8%
l46947
 
5.7%
i42296
 
5.1%
Other values (7)145787
17.7%
Common
ValueCountFrequency (%)
59116
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII880758
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
p111897
12.7%
m93632
10.6%
n87522
9.9%
a86861
9.9%
59116
 
6.7%
u58332
 
6.6%
d52904
 
6.0%
e48008
 
5.5%
t47456
 
5.4%
l46947
 
5.3%
Other values (8)188083
21.4%

waterpoint_type_group
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
communal standpipe
34625 
hand pump
17488 
other
6380 
improved spring
 
784
cattle trough
 
116

Length

Max length18
Median length18
Mean length13.90287879
Min length3

Characters and Unicode

Total characters825831
Distinct characters18
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowcommunal standpipe
2nd rowcommunal standpipe
3rd rowcommunal standpipe
4th rowcommunal standpipe
5th rowcommunal standpipe

Common Values

ValueCountFrequency (%)
communal standpipe34625
58.3%
hand pump17488
29.4%
other6380
 
10.7%
improved spring784
 
1.3%
cattle trough116
 
0.2%
dam7
 
< 0.1%

Length

2021-09-16T01:53:45.409261image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T01:53:45.488658image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
communal34625
30.8%
standpipe34625
30.8%
pump17488
15.6%
hand17488
15.6%
other6380
 
5.7%
spring784
 
0.7%
improved784
 
0.7%
trough116
 
0.1%
cattle116
 
0.1%
dam7
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
p105794
12.8%
m87529
10.6%
n87522
10.6%
a86861
10.5%
53013
 
6.4%
d52904
 
6.4%
u52229
 
6.3%
o41905
 
5.1%
e41905
 
5.1%
t41353
 
5.0%
Other values (8)174816
21.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter772818
93.6%
Space Separator53013
 
6.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
p105794
13.7%
m87529
11.3%
n87522
11.3%
a86861
11.2%
d52904
 
6.8%
u52229
 
6.8%
o41905
 
5.4%
e41905
 
5.4%
t41353
 
5.4%
i36193
 
4.7%
Other values (7)138623
17.9%
Space Separator
ValueCountFrequency (%)
53013
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin772818
93.6%
Common53013
 
6.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
p105794
13.7%
m87529
11.3%
n87522
11.3%
a86861
11.2%
d52904
 
6.8%
u52229
 
6.8%
o41905
 
5.4%
e41905
 
5.4%
t41353
 
5.4%
i36193
 
4.7%
Other values (7)138623
17.9%
Common
ValueCountFrequency (%)
53013
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII825831
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
p105794
12.8%
m87529
10.6%
n87522
10.6%
a86861
10.5%
53013
 
6.4%
d52904
 
6.4%
u52229
 
6.3%
o41905
 
5.1%
e41905
 
5.1%
t41353
 
5.0%
Other values (8)174816
21.2%

status_group
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.2 KiB
functional
32259 
non functional
22824 
functional needs repair
4317 

Length

Max length23
Median length10
Mean length12.48176768
Min length10

Characters and Unicode

Total characters741417
Distinct characters15
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowfunctional
2nd rowfunctional
3rd rowfunctional
4th rownon functional
5th rowfunctional

Common Values

ValueCountFrequency (%)
functional32259
54.3%
non functional22824
38.4%
functional needs repair4317
 
7.3%

Length

2021-09-16T01:53:45.707419image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-16T01:53:45.769921image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
functional59400
65.4%
non22824
 
25.1%
repair4317
 
4.8%
needs4317
 
4.8%

Most occurring characters

ValueCountFrequency (%)
n168765
22.8%
o82224
11.1%
i63717
 
8.6%
a63717
 
8.6%
f59400
 
8.0%
u59400
 
8.0%
c59400
 
8.0%
t59400
 
8.0%
l59400
 
8.0%
31458
 
4.2%
Other values (5)34536
 
4.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter709959
95.8%
Space Separator31458
 
4.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n168765
23.8%
o82224
11.6%
i63717
 
9.0%
a63717
 
9.0%
f59400
 
8.4%
u59400
 
8.4%
c59400
 
8.4%
t59400
 
8.4%
l59400
 
8.4%
e12951
 
1.8%
Other values (4)21585
 
3.0%
Space Separator
ValueCountFrequency (%)
31458
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin709959
95.8%
Common31458
 
4.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
n168765
23.8%
o82224
11.6%
i63717
 
9.0%
a63717
 
9.0%
f59400
 
8.4%
u59400
 
8.4%
c59400
 
8.4%
t59400
 
8.4%
l59400
 
8.4%
e12951
 
1.8%
Other values (4)21585
 
3.0%
Common
ValueCountFrequency (%)
31458
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII741417
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n168765
22.8%
o82224
11.1%
i63717
 
8.6%
a63717
 
8.6%
f59400
 
8.0%
u59400
 
8.0%
c59400
 
8.0%
t59400
 
8.0%
l59400
 
8.0%
31458
 
4.2%
Other values (5)34536
 
4.7%

Interactions

2021-09-16T01:53:18.074190image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:18.230440image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:18.355481image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:18.483346image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:18.608334image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:18.733339image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:18.858339image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:18.983339image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:19.119396image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:19.228809image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:19.365102image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:19.490091image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:19.631989image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:19.741374image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:19.866374image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:19.975748image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:20.100765image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:20.225746image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:20.350749image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:20.553841image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:20.694517image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:20.814043image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:20.939249image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:21.049726image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:21.174715image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:21.299723image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:21.409095image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:21.518430image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:21.647434image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:21.747531image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:21.873872image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:21.998885image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:22.118293image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:22.243299image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:22.369656image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:22.494659image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:22.619665image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:22.744656image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:22.885285image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:23.010242image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:23.146849image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:23.271849image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:23.398152image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:23.517659image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:23.627026image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:23.737756image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:23.862815image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:23.972176image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:24.097134image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:24.206554image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:24.331548image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:24.565894image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:24.690923image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:24.811468image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:24.936465image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:25.062787image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:25.203403image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:25.312803image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:25.437801image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:25.562801image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:25.703423image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:25.828421image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:25.953426image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:26.062794image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:26.197994image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:26.307380image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:26.433672image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:26.543046image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:26.662494image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:26.771919image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:26.898170image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:27.023213image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:27.163836image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:27.288839image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:27.429461image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:27.554456image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:27.695073image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:27.804463image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:27.939819image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:28.064812image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:28.206763image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:28.331767image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:28.456769image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:28.581769image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:28.706735image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:28.816149image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:28.925514image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:29.050519image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:29.175509image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:29.300520image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:29.566146image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:29.706785image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:29.847396image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:29.972397image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:30.113020image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:30.238010image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:30.378648image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:30.503649image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:30.644271image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-16T01:53:30.769261image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-09-16T01:53:45.883226image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-09-16T01:53:46.055101image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-09-16T01:53:46.228245image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-09-16T01:53:46.457011image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-09-16T01:53:46.831525image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-09-16T01:53:31.172536image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-09-16T01:53:33.313382image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-09-16T01:53:34.179085image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-09-16T01:53:34.555677image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

idamount_tshdate_recordedfundergps_heightinstallerlongitudelatitudewpt_namenum_privatebasinsubvillageregionregion_codedistrict_codelgawardpopulationpublic_meetingrecorded_byscheme_managementscheme_namepermitconstruction_yearextraction_typeextraction_type_groupextraction_type_classmanagementmanagement_grouppaymentpayment_typewater_qualityquality_groupquantityquantity_groupsourcesource_typesource_classwaterpoint_typewaterpoint_type_groupstatus_group
0695726000.02011-03-14Roman1390Roman34.938093-9.856322none0Lake NyasaMnyusi BIringa115LudewaMundindi109TrueGeoData Consultants LtdVWCRomanFalse1999gravitygravitygravityvwcuser-grouppay annuallyannuallysoftgoodenoughenoughspringspringgroundwatercommunal standpipecommunal standpipefunctional
187760.02013-03-06Grumeti1399GRUMETI34.698766-2.147466Zahanati0Lake VictoriaNyamaraMara202SerengetiNatta280NaNGeoData Consultants LtdOtherNaNTrue2010gravitygravitygravitywuguser-groupnever paynever paysoftgoodinsufficientinsufficientrainwater harvestingrainwater harvestingsurfacecommunal standpipecommunal standpipefunctional
23431025.02013-02-25Lottery Club686World vision37.460664-3.821329Kwa Mahundi0PanganiMajengoManyara214SimanjiroNgorika250TrueGeoData Consultants LtdVWCNyumba ya mungu pipe schemeTrue2009gravitygravitygravityvwcuser-grouppay per bucketper bucketsoftgoodenoughenoughdamdamsurfacecommunal standpipe multiplecommunal standpipefunctional
3677430.02013-01-28Unicef263UNICEF38.486161-11.155298Zahanati Ya Nanyumbu0Ruvuma / Southern CoastMahakamaniMtwara9063NanyumbuNanyumbu58TrueGeoData Consultants LtdVWCNaNTrue1986submersiblesubmersiblesubmersiblevwcuser-groupnever paynever paysoftgooddrydrymachine dbhboreholegroundwatercommunal standpipe multiplecommunal standpipenon functional
4197280.02011-07-13Action In A0Artisan31.130847-1.825359Shuleni0Lake VictoriaKyanyamisaKagera181KaragweNyakasimbi0TrueGeoData Consultants LtdNaNNaNTrue0gravitygravitygravityotherothernever paynever paysoftgoodseasonalseasonalrainwater harvestingrainwater harvestingsurfacecommunal standpipecommunal standpipefunctional
5994420.02011-03-13Mkinga Distric Coun0DWE39.172796-4.765587Tajiri0PanganiMoa/MweremeTanga48MkingaMoa1TrueGeoData Consultants LtdVWCZingibaliTrue2009submersiblesubmersiblesubmersiblevwcuser-grouppay per bucketper bucketsaltysaltyenoughenoughotherotherunknowncommunal standpipe multiplecommunal standpipefunctional
6198160.02012-10-01Dwsp0DWSP33.362410-3.766365Kwa Ngomho0InternalIshinabulandiShinyanga173Shinyanga RuralSamuye0TrueGeoData Consultants LtdVWCNaNTrue0swn 80swn 80handpumpvwcuser-groupnever paynever paysoftgoodenoughenoughmachine dbhboreholegroundwaterhand pumphand pumpnon functional
7545510.02012-10-09Rwssp0DWE32.620617-4.226198Tushirikiane0Lake TanganyikaNyawishi CenterShinyanga173KahamaChambo0TrueGeoData Consultants LtdNaNNaNTrue0nira/taniranira/tanirahandpumpwuguser-groupunknownunknownmilkymilkyenoughenoughshallow wellshallow wellgroundwaterhand pumphand pumpnon functional
8539340.02012-11-03Wateraid0Water Aid32.711100-5.146712Kwa Ramadhan Musa0Lake TanganyikaImalaudukiTabora146Tabora UrbanItetemia0TrueGeoData Consultants LtdVWCNaNTrue0india mark iiindia mark iihandpumpvwcuser-groupnever paynever paysaltysaltyseasonalseasonalmachine dbhboreholegroundwaterhand pumphand pumpnon functional
9461440.02011-08-03Isingiro Ho0Artisan30.626991-1.257051Kwapeto0Lake VictoriaMkonomreKagera181KaragweKaisho0TrueGeoData Consultants LtdNaNNaNTrue0nira/taniranira/tanirahandpumpvwcuser-groupnever paynever paysoftgoodenoughenoughshallow wellshallow wellgroundwaterhand pumphand pumpfunctional

Last rows

idamount_tshdate_recordedfundergps_heightinstallerlongitudelatitudewpt_namenum_privatebasinsubvillageregionregion_codedistrict_codelgawardpopulationpublic_meetingrecorded_byscheme_managementscheme_namepermitconstruction_yearextraction_typeextraction_type_groupextraction_type_classmanagementmanagement_grouppaymentpayment_typewater_qualityquality_groupquantityquantity_groupsourcesource_typesource_classwaterpoint_typewaterpoint_type_groupstatus_group
59390136770.02011-08-04Rudep1715DWE31.370848-8.258160Kwa Mzee Atanas0Lake TanganyikaKitontoRukwa152Sumbawanga RuralMkowe150TrueGeoData Consultants LtdVWCNaNFalse1991swn 80swn 80handpumpvwcuser-groupnever paynever paysoftgoodinsufficientinsufficientmachine dbhboreholegroundwaterhand pumphand pumpfunctional
59391448850.02013-08-03Government Of Tanzania540Government38.044070-4.272218Kwa0PanganiMaore KatiKilimanjaro33SameMaore210TrueGeoData Consultants LtdWater authorityHingililiTrue1967gravitygravitygravityvwcuser-groupnever paynever paysoftgoodenoughenoughriverriver/lakesurfacecommunal standpipecommunal standpipenon functional
59392406070.02011-04-15Government Of Tanzania0Government33.009440-8.520888Benard Charles0Lake RukwaMbuyuni AMbeya121ChunyaMbuyuni0TrueGeoData Consultants LtdVWCNaNTrue0gravitygravitygravityvwcuser-groupnever paynever paysoftgoodenoughenoughspringspringgroundwatercommunal standpipecommunal standpipenon functional
59393483480.02012-10-27Private0Private33.866852-4.287410Kwa Peter0InternalMasangaTabora142IgungaIgunga0FalseGeoData Consultants LtdWater authorityNaNFalse0gravitygravitygravityprivate operatorcommercialpay per bucketper bucketsoftgoodinsufficientinsufficientdamdamsurfaceotherotherfunctional
5939411164500.02011-03-09World Bank351ML appro37.634053-6.124830Chimeredya0Wami / RuvuKomstariMorogoro56MvomeroDiongoya89TrueGeoData Consultants LtdVWCNaNTrue2007submersiblesubmersiblesubmersiblevwcuser-grouppay monthlymonthlysoftgoodenoughenoughmachine dbhboreholegroundwatercommunal standpipecommunal standpipenon functional
593956073910.02013-05-03Germany Republi1210CES37.169807-3.253847Area Three Namba 270PanganiKiduruniKilimanjaro35HaiMasama Magharibi125TrueGeoData Consultants LtdWater BoardLosaa Kia water supplyTrue1999gravitygravitygravitywater boarduser-grouppay per bucketper bucketsoftgoodenoughenoughspringspringgroundwatercommunal standpipecommunal standpipefunctional
59396272634700.02011-05-07Cefa-njombe1212Cefa35.249991-9.070629Kwa Yahona Kuvala0RufijiIgumbiloIringa114NjombeIkondo56TrueGeoData Consultants LtdVWCIkondo electrical water schTrue1996gravitygravitygravityvwcuser-grouppay annuallyannuallysoftgoodenoughenoughriverriver/lakesurfacecommunal standpipecommunal standpipefunctional
59397370570.02011-04-11NaN0NaN34.017087-8.750434Mashine0RufijiMadunguluMbeya127MbaraliChimala0TrueGeoData Consultants LtdVWCNaNFalse0swn 80swn 80handpumpvwcuser-grouppay monthlymonthlyfluoridefluorideenoughenoughmachine dbhboreholegroundwaterhand pumphand pumpfunctional
59398312820.02011-03-08Malec0Musa35.861315-6.378573Mshoro0RufijiMwinyiDodoma14ChamwinoMvumi Makulu0TrueGeoData Consultants LtdVWCNaNTrue0nira/taniranira/tanirahandpumpvwcuser-groupnever paynever paysoftgoodinsufficientinsufficientshallow wellshallow wellgroundwaterhand pumphand pumpfunctional
59399263480.02011-03-23World Bank191World38.104048-6.747464Kwa Mzee Lugawa0Wami / RuvuKikatanyembaMorogoro52Morogoro RuralNgerengere150TrueGeoData Consultants LtdVWCNaNTrue2002nira/taniranira/tanirahandpumpvwcuser-grouppay when scheme failson failuresaltysaltyenoughenoughshallow wellshallow wellgroundwaterhand pumphand pumpfunctional